[tip:irq/core] genirq/affinity: Add is_managed to struct irq_affinity_desc

2018-12-19 Thread tip-bot for Dou Liyang
Commit-ID:  c410abbbacb9b378365ba17a30df08b4b9eec64f
Gitweb: https://git.kernel.org/tip/c410abbbacb9b378365ba17a30df08b4b9eec64f
Author: Dou Liyang 
AuthorDate: Tue, 4 Dec 2018 23:51:21 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 19 Dec 2018 11:32:08 +0100

genirq/affinity: Add is_managed to struct irq_affinity_desc

Devices which use managed interrupts usually have two classes of
interrupts:

  - Interrupts for multiple device queues
  - Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

 default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled. That limitation was reported by Kashyap and Sumit.

Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
for truly managed interrupts (queue interrupts) and cleared for the general
device interrupts.

[ tglx: Simplify code and massage changelog ]

Reported-by: Kashyap Desai 
Reported-by: Sumit Saxena 
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Cc: shivasharan.srikanteshw...@broadcom.com
Cc: ming@redhat.com
Cc: h...@lst.de
Cc: bhelg...@google.com
Cc: douliya...@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliya...@gmail.com

---
 include/linux/interrupt.h |  1 +
 kernel/irq/affinity.c |  4 
 kernel/irq/irqdesc.c  | 13 -
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index c44b7844dc83..c672f34235e7 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -263,6 +263,7 @@ struct irq_affinity {
  */
 struct irq_affinity_desc {
struct cpumask  mask;
+   unsigned intis_managed : 1;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index c0fe591b0dc9..45b68b4ea48b 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -289,6 +289,10 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
for (; curvec < nvecs; curvec++)
cpumask_copy(&masks[curvec].mask, irq_default_affinity);
 
+   /* Mark the managed interrupts */
+   for (i = affd->pre_vectors; i < nvecs - affd->post_vectors; i++)
+   masks[i].is_managed = 1;
+
 outnodemsk:
free_node_to_cpumask(node_to_cpumask);
return masks;
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index cb401d6c5040..ee062b7939d3 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -453,27 +453,30 @@ static int alloc_descs(unsigned int start, unsigned int 
cnt, int node,
   struct module *owner)
 {
struct irq_desc *desc;
-   unsigned int flags;
int i;
 
/* Validate affinity mask(s) */
if (affinity) {
-   for (i = 0; i < cnt; i++) {
+   for (i = 0; i < cnt; i++, i++) {
if (cpumask_empty(&affinity[i].mask))
return -EINVAL;
}
}
 
-   flags = affinity ? IRQD_AFFINITY_MANAGED | IRQD_MANAGED_SHUTDOWN : 0;
-
for (i = 0; i < cnt; i++) {
const struct cpumask *mask = NULL;
+   unsigned int flags = 0;
 
if (affinity) {
-   node = cpu_to_node(cpumask_first(affinity));
+   if (affinity->is_managed) {
+   flags = IRQD_AFFINITY_MANAGED |
+   IRQD_MANAGED_SHUTDOWN;
+   }
mask = &affinity->mask;
+   node = cpu_to_node(cpumask_first(mask));
affinity++;
}
+
desc = alloc_desc(start + i, node, flags, mask, owner);
if (!desc)
goto err;


[tip:irq/core] genirq/core: Introduce struct irq_affinity_desc

2018-12-19 Thread tip-bot for Dou Liyang
Commit-ID:  bec04037e4e484f41ee4d9409e40616874169d20
Gitweb: https://git.kernel.org/tip/bec04037e4e484f41ee4d9409e40616874169d20
Author: Dou Liyang 
AuthorDate: Tue, 4 Dec 2018 23:51:20 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 19 Dec 2018 11:32:08 +0100

genirq/core: Introduce struct irq_affinity_desc

The interrupt affinity management uses straight cpumask pointers to convey
the automatically assigned affinity masks for managed interrupts. The core
interrupt descriptor allocation also decides based on the pointer being non
NULL whether an interrupt is managed or not.

Devices which use managed interrupts usually have two classes of
interrupts:

  - Interrupts for multiple device queues
  - Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

 default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled.

To remedy that situation it's required to convey more information than the
cpumasks through various interfaces related to interrupt descriptor
allocation.

Instead of adding yet another argument, create a new data structure
'irq_affinity_desc' which for now just contains the cpumask. This struct
can be expanded to convey auxilliary information in the next step.

No functional change, just preparatory work.

[ tglx: Simplified logic and clarified changelog ]

Suggested-by: Thomas Gleixner 
Suggested-by: Bjorn Helgaas 
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Cc: kashyap.de...@broadcom.com
Cc: shivasharan.srikanteshw...@broadcom.com
Cc: sumit.sax...@broadcom.com
Cc: ming@redhat.com
Cc: h...@lst.de
Cc: douliya...@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliya...@gmail.com

---
 drivers/pci/msi.c |  9 -
 include/linux/interrupt.h | 14 --
 include/linux/irq.h   |  6 --
 include/linux/irqdomain.h |  6 --
 include/linux/msi.h   |  4 ++--
 kernel/irq/affinity.c | 22 --
 kernel/irq/devres.c   |  4 ++--
 kernel/irq/irqdesc.c  | 17 +
 kernel/irq/irqdomain.c|  4 ++--
 kernel/irq/msi.c  |  8 
 10 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 265ed3e4c920..7a1c8a09efa5 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -534,14 +534,13 @@ error_attrs:
 static struct msi_desc *
 msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd)
 {
-   struct cpumask *masks = NULL;
+   struct irq_affinity_desc *masks = NULL;
struct msi_desc *entry;
u16 control;
 
if (affd)
masks = irq_create_affinity_masks(nvec, affd);
 
-
/* MSI Entry Initialization */
entry = alloc_msi_entry(&dev->dev, nvec, masks);
if (!entry)
@@ -672,7 +671,7 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
  struct msix_entry *entries, int nvec,
  const struct irq_affinity *affd)
 {
-   struct cpumask *curmsk, *masks = NULL;
+   struct irq_affinity_desc *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;
 
@@ -1264,7 +1263,7 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev 
*dev, int nr)
 
for_each_pci_msi_entry(entry, dev) {
if (i == nr)
-   return entry->affinity;
+   return &entry->affinity->mask;
i++;
}
WARN_ON_ONCE(1);
@@ -1276,7 +1275,7 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev 
*dev, int nr)
 nr >= entry->nvec_used))
return NULL;
 
-   return &entry->affinity[nr];
+   return &entry->affinity[nr].mask;
} else {
return cpu_possible_mask;
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index ca397ff40836..c44b7844dc83 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -257,6 +257,14 @@ struct irq_affinity {
int *sets;
 };
 
+/**
+ * struct irq_affinity_desc - Interrupt affinity descriptor
+ *

Re: [PATCH 3/3] irq/affinity: Fix a possible breakage

2018-12-11 Thread Dou Liyang

Hi tglx,
on 2018/12/5 16:28, Thomas Gleixner wrote:

On Tue, 4 Dec 2018, Dou Liyang wrote:


In case of irq_default_affinity != cpu_possible_mask, setting the affinity
for the pre/post vectors to irq_default_affinity is a breakage.


Why so? All interrupts which are not managed get te default affinity
mask. It can be different than cpu_possible_mask, but that's what the admin
has set. The affinity of these non-managed interrupts can still be set via
/proc/... so where is the breakage?


I misunderstood it. please ignore this, ;-)

Thanks,
dou


[PATCH 2/3] irq/affinity: Add is_managed into struct irq_affinity_desc

2018-12-04 Thread Dou Liyang
Now, Linux uses the irq_affinity_desc to convey information.

As Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
may be used to some extra reply queues for performance.

  https://marc.info/?l=linux-kernel&m=153543887027997&w=2

Their affinities are not NULL, but, they should be mapped as unmanaged
interrupts. So, only transfering the irq affinity assignments is not enough.

Add a new bit "is_managed" to convey the info in irq_affinity_desc and use
it in alloc_descs().

Reported-by: Kashyap Desai 
Reported-by: Sumit Saxena 
Signed-off-by: Dou Liyang 
---
 include/linux/interrupt.h | 1 +
 kernel/irq/affinity.c | 7 +++
 kernel/irq/irqdesc.c  | 9 +++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 71be303231e9..a12b3dbbc45e 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -263,6 +263,7 @@ struct irq_affinity {
  */
 struct irq_affinity_desc {
struct cpumask  mask;
+   unsigned intis_managed : 1;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 1562a36e7c0f..d122575ba1b4 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -289,6 +289,13 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
for (; curvec < nvecs; curvec++)
cpumask_copy(&masks[curvec].mask, irq_default_affinity);
 
+   /*  Setup complementary information */
+   for (i = 0; i < nvecs; i++) {
+   if (i >= affd->pre_vectors && i < nvecs - affd->post_vectors)
+   masks[i].is_managed = 1;
+   else
+   masks[i].is_managed = 0;
+   }
 outnodemsk:
free_node_to_cpumask(node_to_cpumask);
return masks;
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index f87fa2b9935a..6b0821c144c0 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -455,7 +455,7 @@ static int alloc_descs(unsigned int start, unsigned int 
cnt, int node,
const struct irq_affinity_desc *cur_affinity= affinity;
const struct cpumask *mask = NULL;
struct irq_desc *desc;
-   unsigned int flags;
+   unsigned int flags = 0;
int i;
 
/* Validate affinity mask(s) */
@@ -468,11 +468,16 @@ static int alloc_descs(unsigned int start, unsigned int 
cnt, int node,
}
}
 
-   flags = affinity ? IRQD_AFFINITY_MANAGED | IRQD_MANAGED_SHUTDOWN : 0;
mask = NULL;
 
for (i = 0; i < cnt; i++) {
if (affinity) {
+   if (affinity->is_managed) {
+   flags = IRQD_AFFINITY_MANAGED |
+   IRQD_MANAGED_SHUTDOWN;
+   } else {
+   flags = 0;
+   }
mask = &affinity->mask;
node = cpu_to_node(cpumask_first(mask));
affinity++;
-- 
2.17.2



[PATCH 3/3] irq/affinity: Fix a possible breakage

2018-12-04 Thread Dou Liyang
In case of irq_default_affinity != cpu_possible_mask, setting the affinity
for the pre/post vectors to irq_default_affinity is a breakage.

Just set the pre/post vectors to cpu_possible_mask and be done with it.

Suggested-by: Thomas Gleixner 
Signed-off-by: Dou Liyang 
---
 kernel/irq/affinity.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index d122575ba1b4..aaa1dd82c3df 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -257,7 +257,7 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
 
/* Fill out vectors at the beginning that don't need affinity */
for (curvec = 0; curvec < affd->pre_vectors; curvec++)
-   cpumask_copy(&masks[curvec].mask, irq_default_affinity);
+   cpumask_copy(&masks[curvec].mask, cpu_possible_mask);
/*
 * Spread on present CPUs starting from affd->pre_vectors. If we
 * have multiple sets, build each sets affinity mask separately.
@@ -282,12 +282,15 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
}
 
/* Fill out vectors at the end that don't need affinity */
-   if (usedvecs >= affvecs)
+   if (usedvecs >= affvecs) {
curvec = affd->pre_vectors + affvecs;
-   else
+   } else {
curvec = affd->pre_vectors + usedvecs;
+   for (; curvec < affd->pre_vectors + affvecs; curvec++)
+   cpumask_copy(&masks[curvec].mask, irq_default_affinity);
+   }
for (; curvec < nvecs; curvec++)
-   cpumask_copy(&masks[curvec].mask, irq_default_affinity);
+   cpumask_copy(&masks[curvec].mask, cpu_possible_mask);
 
/*  Setup complementary information */
for (i = 0; i < nvecs; i++) {
-- 
2.17.2



[PATCH 1/3] genirq/core: Add a new interrupt affinity descriptor

2018-12-04 Thread Dou Liyang
Now, Linux just spreads the interrupt affinity info by a cpumask pointer
and mark it as managed interrupt if its cpumask is not NULL.

if there are some other info should be passed, this design is not good
to expand, adding new arguments is the most staightforward method, But
this will break many functions.

So, add a new interrupt affinity descriptor, replace the cpumask pointer
with its pointer which allows to expand this in the future without touching
all the functions ever again, Just modify the data irq_affinity_desc
structure.

No functional change, just prepares for support of spreading managed
flags.

Suggested-by: Thomas Gleixner 
Suggested-by: Bjorn Helgaas 
Signed-off-by: Dou Liyang 
---
 drivers/pci/msi.c |  9 -
 include/linux/interrupt.h | 14 --
 include/linux/irq.h   |  6 --
 include/linux/irqdomain.h |  6 --
 include/linux/msi.h   |  4 ++--
 kernel/irq/affinity.c | 22 --
 kernel/irq/devres.c   |  4 ++--
 kernel/irq/irqdesc.c  | 16 ++--
 kernel/irq/irqdomain.c|  4 ++--
 kernel/irq/msi.c  |  7 ---
 10 files changed, 56 insertions(+), 36 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 265ed3e4c920..7a1c8a09efa5 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -534,14 +534,13 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
 static struct msi_desc *
 msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd)
 {
-   struct cpumask *masks = NULL;
+   struct irq_affinity_desc *masks = NULL;
struct msi_desc *entry;
u16 control;
 
if (affd)
masks = irq_create_affinity_masks(nvec, affd);
 
-
/* MSI Entry Initialization */
entry = alloc_msi_entry(&dev->dev, nvec, masks);
if (!entry)
@@ -672,7 +671,7 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
  struct msix_entry *entries, int nvec,
  const struct irq_affinity *affd)
 {
-   struct cpumask *curmsk, *masks = NULL;
+   struct irq_affinity_desc *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;
 
@@ -1264,7 +1263,7 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev 
*dev, int nr)
 
for_each_pci_msi_entry(entry, dev) {
if (i == nr)
-   return entry->affinity;
+   return &entry->affinity->mask;
i++;
}
WARN_ON_ONCE(1);
@@ -1276,7 +1275,7 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev 
*dev, int nr)
 nr >= entry->nvec_used))
return NULL;
 
-   return &entry->affinity[nr];
+   return &entry->affinity[nr].mask;
} else {
return cpu_possible_mask;
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index ca397ff40836..71be303231e9 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -257,6 +257,14 @@ struct irq_affinity {
int *sets;
 };
 
+/**
+ * struct irq_affinity_desc - Interrupt affinity descriptor
+ * @mask:  It's one cpumask per descriptor.
+ */
+struct irq_affinity_desc {
+   struct cpumask  mask;
+};
+
 #if defined(CONFIG_SMP)
 
 extern cpumask_var_t irq_default_affinity;
@@ -303,7 +311,9 @@ extern int irq_set_affinity_hint(unsigned int irq, const 
struct cpumask *m);
 extern int
 irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify 
*notify);
 
-struct cpumask *irq_create_affinity_masks(int nvec, const struct irq_affinity 
*affd);
+struct irq_affinity_desc *
+irq_create_affinity_masks(int nvec, const struct irq_affinity *affd);
+
 int irq_calc_affinity_vectors(int minvec, int maxvec, const struct 
irq_affinity *affd);
 
 #else /* CONFIG_SMP */
@@ -337,7 +347,7 @@ irq_set_affinity_notifier(unsigned int irq, struct 
irq_affinity_notify *notify)
return 0;
 }
 
-static inline struct cpumask *
+static inline struct irq_affinity_desc *
 irq_create_affinity_masks(int nvec, const struct irq_affinity *affd)
 {
return NULL;
diff --git a/include/linux/irq.h b/include/linux/irq.h
index c9bffda04a45..def2b2aac8b1 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -27,6 +27,7 @@
 struct seq_file;
 struct module;
 struct msi_msg;
+struct irq_affinity_desc;
 enum irqchip_irq_state;
 
 /*
@@ -834,11 +835,12 @@ struct cpumask 
*irq_data_get_effective_affinity_mask(struct irq_data *d)
 unsigned int arch_dynirq_lower_bound(unsigned int from);
 
 int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
- struct module *owner, const struct cpumask *affinity);
+ struct module *owner,
+ const struct irq_affinity_desc *af

[PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor

2018-12-04 Thread Dou Liyang
Now,  Spreading the interrupt affinity info by a cpumask pointer is not
enough, meets a problem[1] and hard to expand in the future.

Fix it by:

 +---+
 |   |
 | struct cpumask *affinity  |
 |   |
 +---+
   |
+--v---+
|  |
| struct irq_affinity_desc {   |
| struct cpumask   mask;   |
| unsigned int is_managed : 1; |
| };   |
|  |
+--+

[1]:https://marc.info/?l=linux-kernel&m=153543887027997&w=2

Dou Liyang (3):
  genirq/affinity: Add a new interrupt affinity descriptor
  irq/affinity: Add is_managed into struct irq_affinity_desc
  irq/affinity: Fix a possible breakage

 drivers/pci/msi.c |  9 -
 include/linux/interrupt.h | 15 +--
 include/linux/irq.h   |  6 --
 include/linux/irqdomain.h |  6 --
 include/linux/msi.h   |  4 ++--
 kernel/irq/affinity.c | 38 +-
 kernel/irq/devres.c   |  4 ++--
 kernel/irq/irqdesc.c  | 25 +
 kernel/irq/irqdomain.c|  4 ++--
 kernel/irq/msi.c  |  7 ---
 10 files changed, 77 insertions(+), 41 deletions(-)

-- 
2.17.2



Re: [RFC PATCH v3] genirq/affinity: Create and transfer more irq desc info by a new structure

2018-11-29 Thread Dou Liyang

Hi Thomas,

On 2018/11/29 6:03, Thomas Gleixner wrote:


+   affi_desc = kcalloc(nvec, sizeof(*affi_desc), GFP_KERNEL);


Why do you want to do that separate allocation here? Just let


I thought the irq_create_affinity_desc() also can be called by other 
functions which may convert cpumasks to irq_affinity_desc, such as

__devm_irq_alloc_descs().

Now, I know I was wrong, will modify it.


irq_create_affinity_masks() allocate an array of affinity descriptors and
use that. There is no point in copying that stuff over and over. Setting
the flag field can be done in the existing function as well.



Can you please change the function signature and fixup the callers, if
there are any of them? Copying this over and over is horrible.


I have searched, no one calls __devm_irq_alloc_descs, it may be called
by some users' own modules or drives.

yes, I will change it.


struct irq_affinity_desc {
struct cpumask  masks;
unsigned intmanaged : 1; > };


yes, BTW, If the following is more fit for irq_affinity_desc:

s/masks/mask/
s/managed/is_managed/






You can spare a lot of pointless churn by just keeping the 'affinity' name
and only changing the struct type. The compiler will catch all places which
need to be fixed and 'affinity' is generic enough to be used with the new
struct type as well. As Bjorn said, even 'masks' is fine.


Yes, I see

Thanks,
dou


Re: [RFC PATCH v3] genirq/affinity: Create and transfer more irq desc info by a new structure

2018-11-29 Thread Dou Liyang

Hi Bjorn,

on 2018/11/29 4:00, Bjorn Helgaas wrote:

[+cc linux-pci]
Since you mention reports, are there URLs to mailing list archives you
can include?

OK, I will add it:

https://marc.info/?l=linux-kernel&m=153543887027997&w=2


-   entry = alloc_msi_entry(&dev->dev, nvec, masks);
+   entry = alloc_msi_entry(&dev->dev, nvec, affi_desc);


Can you split this into two or more patches?  Most of these changes


Yes, next non-RFC version will do.


are trivial and not very interesting, and the fact that they're all in
one patch makes it hard to find and review the interesting bits.  For
example,

   1) Rename all the local variables while keeping the type the same
(or just leave the name the same; I think "affinity" would be a fine
name, and I would be OK if we ended up with "struct irq_affinity_desc
*masks" or "struct irq_affinity_desc *affinity").  This patch would
obviously have no functional impact and would remove a lot of the
trivial changes.
Oops, how stupid I am !



   2) Add "struct irq_affinity_desc" containing only "struct cpumask
masks" and irq_create_affinity_desc() (or leave the name as
irq_create_affinity_masks() and adjust the signature).  This would
also have no functional impact and would be a fairly trivial patch.
>3) Add "flags" to struct irq_affinity_desc and the related code.
This is the real meat of your patch, and with the above out of the
way, it will be much smaller and it'll be obvious what the important
changes are.


I see, :)

Thanks,
dou


[RFC PATCH v3] genirq/affinity: Create and transfer more irq desc info by a new structure

2018-11-28 Thread Dou Liyang
Now, Linux just spread the irq affinity to irqdesc core by a cpumask pointer.
if an Vector's affinity is not NULL, it will be marked as managed.

But, as Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
may be used to some extra reply queues for performance. their affinities are
not NULL, but, they should be mapped as unmanaged interrupts. So, only
transfering the irq affinity assignments is not enough

Create a new structure named irq_affinity_desc, which include both the irq
affinity masks and flags. Replace the cpumask pointer with a irq_affinity_desc
pointer which allows to expand this in the future without touching all the
functions ever again, just modify the data irq_affinity_desc structure.

Reported-by: Kashyap Desai 
Reported-by: Sumit Saxena 
Suggested-by: Thomas Gleixner 
Signed-off-by: Dou Liyang 
---
Changelog:
  v2 --> v3
  - Create a new irq_affinity_desc pointer to transfer the info
suggested by tglx
  - rebase to the tip irq/core branch

  v1 --> v2
  - Add a bitmap for marking if an interrupt is managed or not.
  the size of bitmap is runtime allocation.

  - Need more tests, Just test this patch in QEmu.

  - v1: https://lkml.org/lkml/2018/9/13/366

 drivers/pci/msi.c | 29 ++--
 include/linux/interrupt.h | 19 ---
 include/linux/irq.h   |  3 ++-
 include/linux/irqdomain.h |  7 ---
 include/linux/msi.h   |  4 ++--
 kernel/irq/affinity.c | 40 +--
 kernel/irq/devres.c   | 23 --
 kernel/irq/irqdesc.c  | 32 +++
 kernel/irq/irqdomain.c| 14 +++---
 kernel/irq/msi.c  | 21 ++--
 10 files changed, 135 insertions(+), 57 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 265ed3e4c920..431449163316 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -534,16 +534,15 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
 static struct msi_desc *
 msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd)
 {
-   struct cpumask *masks = NULL;
+   struct irq_affinity_desc *affi_desc = NULL;
struct msi_desc *entry;
u16 control;
 
if (affd)
-   masks = irq_create_affinity_masks(nvec, affd);
-
+   affi_desc = irq_create_affinity_desc(nvec, affd);
 
/* MSI Entry Initialization */
-   entry = alloc_msi_entry(&dev->dev, nvec, masks);
+   entry = alloc_msi_entry(&dev->dev, nvec, affi_desc);
if (!entry)
goto out;
 
@@ -567,7 +566,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, const struct 
irq_affinity *affd)
pci_read_config_dword(dev, entry->mask_pos, &entry->masked);
 
 out:
-   kfree(masks);
+   kfree(affi_desc);
return entry;
 }
 
@@ -672,15 +671,15 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
  struct msix_entry *entries, int nvec,
  const struct irq_affinity *affd)
 {
-   struct cpumask *curmsk, *masks = NULL;
+   struct irq_affinity_desc *cur_affi_desc, *affi_desc = NULL;
struct msi_desc *entry;
int ret, i;
 
if (affd)
-   masks = irq_create_affinity_masks(nvec, affd);
+   affi_desc = irq_create_affinity_desc(nvec, affd);
 
-   for (i = 0, curmsk = masks; i < nvec; i++) {
-   entry = alloc_msi_entry(&dev->dev, 1, curmsk);
+   for (i = 0, cur_affi_desc = affi_desc; i < nvec; i++) {
+   entry = alloc_msi_entry(&dev->dev, 1, cur_affi_desc);
if (!entry) {
if (!i)
iounmap(base);
@@ -701,12 +700,12 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
entry->mask_base= base;
 
list_add_tail(&entry->list, dev_to_msi_list(&dev->dev));
-   if (masks)
-   curmsk++;
+   if (affi_desc)
+   cur_affi_desc++;
}
ret = 0;
 out:
-   kfree(masks);
+   kfree(affi_desc);
return ret;
 }
 
@@ -1264,7 +1263,7 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev 
*dev, int nr)
 
for_each_pci_msi_entry(entry, dev) {
if (i == nr)
-   return entry->affinity;
+   return &entry->affi_desc->masks;
i++;
}
WARN_ON_ONCE(1);
@@ -1272,11 +1271,11 @@ const struct cpumask *pci_irq_get_affinity(struct 
pci_dev *dev, int nr)
} else if (dev->msi_enabled) {
struct msi_desc *entry = first_pci_msi_entry(dev);
 
-   if (WARN_ON_ONCE(!entry || !entry->affinity ||
+   

[RFC PATCH v2] gen/irq: Change the way to differentiate between managed and unmanaged interrupts by bitmap

2018-11-09 Thread Dou Liyang
As Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
may be used to some extra reply queues for performance. the best way to
map the pre/post vectors is map them to unmanaged interrupts.

But, current Linux can't do that, because the way which we use to
differentiate between managed and unmanaged interrupts is affinity mask.
If the affinity mask is present, the interrupt will be mark as managed.
According to the pre- and post- vectors' affinity masks are set when kernel
boots up, they are always managed interrupts as other MSI(-X) vectors.
In other words, Linux can't both seting the affinity mask and the managed flag.

So, decouple the managed flag from the affinity mask, add a new bitmap to
differentiate between managed and unmanaged interrupts.

Reported-by: Kashyap Desai 
Reported-by: Sumit Saxena 
Suggested-by: Thomas Gleixner 
Signed-off-by: Dou Liyang 
---
Changelog:
  - Add a bitmap for marking if an interrupt is managed or not.
  the size of bitmap is runtime allocation.

  - Need more tests, Just test this patch in QEmu.

  - v1: https://lkml.org/lkml/2018/9/13/366

 arch/sparc/kernel/irq_64.c  |  2 +-
 arch/x86/kernel/apic/io_apic.c  |  4 ++--
 drivers/base/platform-msi.c |  2 +-
 drivers/bus/fsl-mc/fsl-mc-msi.c |  2 +-
 drivers/irqchip/irq-gic-v4.c|  2 +-
 drivers/pci/msi.c   | 12 
 include/linux/interrupt.h   |  5 +++--
 include/linux/irq.h |  4 ++--
 include/linux/irqdomain.h   |  6 +++---
 include/linux/msi.h |  3 ++-
 kernel/irq/affinity.c   | 27 +++
 kernel/irq/devres.c | 13 +++--
 kernel/irq/ipi.c|  4 ++--
 kernel/irq/irqdesc.c| 15 ++-
 kernel/irq/irqdomain.c  | 15 ---
 kernel/irq/msi.c|  7 +--
 16 files changed, 83 insertions(+), 40 deletions(-)

diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index 713670e6d13d..1ce85779b2bb 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -242,7 +242,7 @@ unsigned int irq_alloc(unsigned int dev_handle, unsigned 
int dev_ino)
 {
int irq;
 
-   irq = __irq_alloc_descs(-1, 1, 1, numa_node_id(), NULL, NULL);
+   irq = __irq_alloc_descs(-1, 1, 1, numa_node_id(), NULL, NULL, NULL);
if (irq <= 0)
goto out;
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 2953bbf05c08..2e7c8b77874a 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -982,7 +982,7 @@ static int alloc_irq_from_domain(struct irq_domain *domain, 
int ioapic, u32 gsi,
 
return __irq_domain_alloc_irqs(domain, irq, 1,
   ioapic_alloc_attr_node(info),
-  info, legacy, NULL);
+  info, legacy, NULL, NULL);
 }
 
 /*
@@ -1017,7 +1017,7 @@ static int alloc_isa_irq_from_domain(struct irq_domain 
*domain,
} else {
info->flags |= X86_IRQ_ALLOC_LEGACY;
irq = __irq_domain_alloc_irqs(domain, irq, 1, node, info, true,
- NULL);
+ NULL, NULL);
if (irq >= 0) {
irq_data = irq_domain_get_irq_data(domain, irq);
data = irq_data->chip_data;
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index f39a920496fb..c5d319d92eb1 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -134,7 +134,7 @@ static int platform_msi_alloc_descs_with_irq(struct device 
*dev, int virq,
}
 
for (i = 0; i < nvec; i++) {
-   desc = alloc_msi_entry(dev, 1, NULL);
+   desc = alloc_msi_entry(dev, 1, NULL, 0);
if (!desc)
break;
 
diff --git a/drivers/bus/fsl-mc/fsl-mc-msi.c b/drivers/bus/fsl-mc/fsl-mc-msi.c
index 8b9c66d7c4ff..92e6266e10d2 100644
--- a/drivers/bus/fsl-mc/fsl-mc-msi.c
+++ b/drivers/bus/fsl-mc/fsl-mc-msi.c
@@ -214,7 +214,7 @@ static int fsl_mc_msi_alloc_descs(struct device *dev, 
unsigned int irq_count)
struct msi_desc *msi_desc;
 
for (i = 0; i < irq_count; i++) {
-   msi_desc = alloc_msi_entry(dev, 1, NULL);
+   msi_desc = alloc_msi_entry(dev, 1, NULL, 0);
if (!msi_desc) {
dev_err(dev, "Failed to allocate msi entry\n");
error = -ENOMEM;
diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c
index dba9d67cb9c1..17b459e283a1 100644
--- a/drivers/irqchip/irq-gic-v4.c
+++ b/drivers/irqchip/irq-gic-v4.c
@@ -119,7 +119,7 @@ int its_alloc_vcpu_irqs(struct its_vm *vm)
 
vpe_base_irq = __irq_domain_alloc_irqs(vm->domain, -1, vm->nr_vpes,
  

Re: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts

2018-09-20 Thread Dou Liyang

Hi Kashyap,

On 2018/9/20 16:39, Kashyap Desai worte:

Dou -  Do you want me to test your patch or shall I wait for next version
?


I'm sorry, please wait for the next version.

Thanks,
dou



[tip:x86/apic] irq/matrix: Spread managed interrupts on allocation

2018-09-18 Thread tip-bot for Dou Liyang
Commit-ID:  76f99ae5b54d48430d1f0c5512a84da0ff9761e0
Gitweb: https://git.kernel.org/tip/76f99ae5b54d48430d1f0c5512a84da0ff9761e0
Author: Dou Liyang 
AuthorDate: Sun, 9 Sep 2018 01:58:38 +0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 18 Sep 2018 18:27:24 +0200

irq/matrix: Spread managed interrupts on allocation

Linux spreads out the non managed interrupt across the possible target CPUs
to avoid vector space exhaustion.

Managed interrupts are treated differently, as for them the vectors are
reserved (with guarantee) when the interrupt descriptors are initialized.

When the interrupt is requested a real vector is assigned. The assignment
logic uses the first CPU in the affinity mask for assignment. If the
interrupt has more than one CPU in the affinity mask, which happens when a
multi queue device has less queues than CPUs, then doing the same search as
for non managed interrupts makes sense as it puts the interrupt on the
least interrupt plagued CPU. For single CPU affine vectors that's obviously
a NOOP.

Restructre the matrix allocation code so it does the 'best CPU' search, add
the sanity check for an empty affinity mask and adapt the call site in the
x86 vector management code.

[ tglx: Added the empty mask check to the core and improved change log ]

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liy...@163.com

---
 arch/x86/kernel/apic/vector.c |  9 -
 include/linux/irq.h   |  3 ++-
 kernel/irq/matrix.c   | 17 ++---
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 7654febd5102..652e7ffa9b9d 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -313,14 +313,13 @@ assign_managed_vector(struct irq_data *irqd, const struct 
cpumask *dest)
struct apic_chip_data *apicd = apic_chip_data(irqd);
int vector, cpu;
 
-   cpumask_and(vector_searchmask, vector_searchmask, affmsk);
-   cpu = cpumask_first(vector_searchmask);
-   if (cpu >= nr_cpu_ids)
-   return -EINVAL;
+   cpumask_and(vector_searchmask, dest, affmsk);
+
/* set_affinity might call here for nothing */
if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
return 0;
-   vector = irq_matrix_alloc_managed(vector_matrix, cpu);
+   vector = irq_matrix_alloc_managed(vector_matrix, vector_searchmask,
+ &cpu);
trace_vector_alloc_managed(irqd->irq, vector, vector);
if (vector < 0)
return vector;
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 201de12a9957..c9bffda04a45 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -1151,7 +1151,8 @@ void irq_matrix_offline(struct irq_matrix *m);
 void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool 
replace);
 int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask 
*msk);
 void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask 
*msk);
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu);
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu);
 void irq_matrix_reserve(struct irq_matrix *m);
 void irq_matrix_remove_reserved(struct irq_matrix *m);
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 67768bbe736e..6e6d467f3dec 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -260,11 +260,21 @@ void irq_matrix_remove_managed(struct irq_matrix *m, 
const struct cpumask *msk)
  * @m: Matrix pointer
  * @cpu:   On which CPU the interrupt should be allocated
  */
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu)
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+unsigned int *mapped_cpu)
 {
-   struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
-   unsigned int bit, end = m->alloc_end;
+   unsigned int bit, cpu, end = m->alloc_end;
+   struct cpumap *cm;
+
+   if (cpumask_empty(msk))
+   return -EINVAL;
 
+   cpu = matrix_find_best_cpu(m, msk);
+   if (cpu == UINT_MAX)
+   return -ENOSPC;
+
+   cm = per_cpu_ptr(m->maps, cpu);
+   end = m->alloc_end;
/* Get managed bit which are not allocated */
bitmap_andnot(m->scratch_map, cm->managed_map, cm->alloc_map, end);
bit = find_first_bit(m->scratch_map, end);
@@ -273,6 +283,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned 
int cpu)
set_bit(bit, cm->alloc_map);
cm->allocated++;
m->total_allocated++;
+  

[tip:x86/apic] irq/matrix: Split out the CPU selection code into a helper

2018-09-18 Thread tip-bot for Dou Liyang
Commit-ID:  8ffe4e61c06a48324cfd97f1199bb9838acce2f2
Gitweb: https://git.kernel.org/tip/8ffe4e61c06a48324cfd97f1199bb9838acce2f2
Author: Dou Liyang 
AuthorDate: Sun, 9 Sep 2018 01:58:37 +0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 18 Sep 2018 18:27:24 +0200

irq/matrix: Split out the CPU selection code into a helper

Linux finds the CPU which has the lowest vector allocation count to spread
out the non managed interrupts across the possible target CPUs, but does
not do so for managed interrupts.

Split out the CPU selection code into a helper function for reuse. No
functional change.

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liy...@163.com

---
 kernel/irq/matrix.c | 65 +++--
 1 file changed, 38 insertions(+), 27 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 5092494bf261..67768bbe736e 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -124,6 +124,27 @@ static unsigned int matrix_alloc_area(struct irq_matrix 
*m, struct cpumap *cm,
return area;
 }
 
+/* Find the best CPU which has the lowest vector allocation count */
+static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
+   const struct cpumask *msk)
+{
+   unsigned int cpu, best_cpu, maxavl = 0;
+   struct cpumap *cm;
+
+   best_cpu = UINT_MAX;
+
+   for_each_cpu(cpu, msk) {
+   cm = per_cpu_ptr(m->maps, cpu);
+
+   if (!cm->online || cm->available <= maxavl)
+   continue;
+
+   best_cpu = cpu;
+   maxavl = cm->available;
+   }
+   return best_cpu;
+}
+
 /**
  * irq_matrix_assign_system - Assign system wide entry in the matrix
  * @m: Matrix pointer
@@ -322,37 +343,27 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 bool reserved, unsigned int *mapped_cpu)
 {
-   unsigned int cpu, best_cpu, maxavl = 0;
+   unsigned int cpu, bit;
struct cpumap *cm;
-   unsigned int bit;
 
-   best_cpu = UINT_MAX;
-   for_each_cpu(cpu, msk) {
-   cm = per_cpu_ptr(m->maps, cpu);
-
-   if (!cm->online || cm->available <= maxavl)
-   continue;
+   cpu = matrix_find_best_cpu(m, msk);
+   if (cpu == UINT_MAX)
+   return -ENOSPC;
 
-   best_cpu = cpu;
-   maxavl = cm->available;
-   }
+   cm = per_cpu_ptr(m->maps, cpu);
+   bit = matrix_alloc_area(m, cm, 1, false);
+   if (bit >= m->alloc_end)
+   return -ENOSPC;
+   cm->allocated++;
+   cm->available--;
+   m->total_allocated++;
+   m->global_available--;
+   if (reserved)
+   m->global_reserved--;
+   *mapped_cpu = cpu;
+   trace_irq_matrix_alloc(bit, cpu, m, cm);
+   return bit;
 
-   if (maxavl) {
-   cm = per_cpu_ptr(m->maps, best_cpu);
-   bit = matrix_alloc_area(m, cm, 1, false);
-   if (bit < m->alloc_end) {
-   cm->allocated++;
-   cm->available--;
-   m->total_allocated++;
-   m->global_available--;
-   if (reserved)
-   m->global_reserved--;
-   *mapped_cpu = best_cpu;
-   trace_irq_matrix_alloc(bit, best_cpu, m, cm);
-   return bit;
-   }
-   }
-   return -ENOSPC;
 }
 
 /**


Re: [PATCH v3 2/2] irq/matrix: Spread managed interrupts on allocation

2018-09-18 Thread Dou Liyang

Dear Thomas,

On 2018/9/17 23:32, Thomas Gleixner wrote:
[...]

I think it's still worthwhile to do that, but the changelog needs a major
overhaul as right now it's outright misleading. I'll just amend it with
something along the above lines, unless someone disagrees.



Yeah, Yes, right, I was wrong, can't prevent vector exhaustion, just
make it looks balance as you said. Thank you so much.


That said, it might also be interesting to allow user space affinity
settings on managed interrupts. Not meant for the pre/post vector case,
which just needs to be made non managed. It's meant for the case where a


Yes, I am cooking according to the direction you are proposing.
Recently, I changed my PC and just completed the configuration of the
environment. ;-)

Thanks,
dou



Re: [PATCH] sched/core: Fix compiling warring in smp=n case

2018-09-13 Thread Dou Liyang




At 08/10/2018 10:35 AM, Dou Liyang wrote:

When compiling kernel with SMP disabled, the build warns with:

kernel/sched/core.c: In function ‘update_rq_clock_task’:
kernel/sched/core.c:139:17: warning: unused variable ‘irq_delta’ 
[-Wunused-variable]
   s64 steal = 0, irq_delta = 0;

Fix this by revert the HAVE_SCHED_AVG_IRQ to

   defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)

Fixes: 2e62c4743adc ("sched/fair: Remove #ifdefs from scale_rt_capacity()")


Hi,

 Miguel also found this warning. Can we pick it up now? ;-)

Thanks,
dou







[RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts

2018-09-12 Thread Dou Liyang
From: Dou Liyang 

As Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
may be used to some extra reply queues for performance. the best way to
map the pre/post vectors is map them to the local numa node.

But, current Linux can't do that, because

  The pre and post vectors are marked managed and their affinity mask
  is set to the irq default affinity mask.

  The default affinity mask is by default ALL cpus, but it can be tweaked
  both on the kernel command line and via proc.

  If that mask is only a subset of CPUs and all of them go offline
  then these vectors are shutdown in managed mode.

So, clear these affinity mask and check it in alloc_desc() to leave them
as regular interrupts which can be affinity controlled and also can move
freely on hotplug.

Note: will break the validation of affinity mask(s)

Reported-by: Kashyap Desai 
Reported-by: Sumit Saxena 
Suggested-by: Thomas Gleixner 
Signed-off-by: Dou Liyang 
---
 kernel/irq/affinity.c |  9 ++---
 kernel/irq/irqdesc.c  | 24 ++--
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..ba35a5050dd2 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -204,7 +204,7 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
 
/* Fill out vectors at the beginning that don't need affinity */
for (curvec = 0; curvec < affd->pre_vectors; curvec++)
-   cpumask_copy(masks + curvec, irq_default_affinity);
+   cpumask_clear(masks + curvec);
 
/* Stabilize the cpumasks */
get_online_cpus();
@@ -234,10 +234,13 @@ irq_create_affinity_masks(int nvecs, const struct 
irq_affinity *affd)
/* Fill out vectors at the end that don't need affinity */
if (usedvecs >= affvecs)
curvec = affd->pre_vectors + affvecs;
-   else
+   else {
curvec = affd->pre_vectors + usedvecs;
+   for (; curvec < affd->pre_vectors + affvecs; curvec++)
+   cpumask_copy(masks + curvec, irq_default_affinity);
+   }
for (; curvec < nvecs; curvec++)
-   cpumask_copy(masks + curvec, irq_default_affinity);
+   cpumask_clear(masks + curvec);
 
 outnodemsk:
free_node_to_cpumask(node_to_cpumask);
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 578d0e5f1b5b..5cffa791a20b 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -453,24 +453,20 @@ static int alloc_descs(unsigned int start, unsigned int 
cnt, int node,
 {
const struct cpumask *mask = NULL;
struct irq_desc *desc;
-   unsigned int flags;
+   unsigned int flags = 0;
int i;
 
-   /* Validate affinity mask(s) */
-   if (affinity) {
-   for (i = 0, mask = affinity; i < cnt; i++, mask++) {
-   if (cpumask_empty(mask))
-   return -EINVAL;
-   }
-   }
-
-   flags = affinity ? IRQD_AFFINITY_MANAGED | IRQD_MANAGED_SHUTDOWN : 0;
-   mask = NULL;
-
for (i = 0; i < cnt; i++) {
if (affinity) {
-   node = cpu_to_node(cpumask_first(affinity));
-   mask = affinity;
+   if (cpumask_empty(affinity)) {
+   flags = 0;
+   mask = NULL;
+   } else {
+   flags = IRQD_AFFINITY_MANAGED |
+   IRQD_MANAGED_SHUTDOWN;
+   mask = affinity;
+   node = cpu_to_node(cpumask_first(affinity));
+   }
affinity++;
}
desc = alloc_desc(start + i, node, flags, mask, owner);
-- 
2.14.3




[PATCH v3 2/2] irq/matrix: Spread managed interrupts on allocation

2018-09-08 Thread Dou Liyang
From: Dou Liyang 

Linux has spread out the non managed interrupt across the possible
target CPUs to avoid vector space exhaustion.

But, the same situation may happen on the managed interrupts.

Spread managed interrupt on allocation as well.

Note: also change the return value for the empty search mask case
from EINVAL to ENOSPC.

Signed-off-by: Dou Liyang 
---
Changelog v3 --> v2

 - Mention the changes in the changelog suggested by tglx
 - Use the new matrix_find_best_cpu() helper

 arch/x86/kernel/apic/vector.c |  8 +++-
 include/linux/irq.h   |  3 ++-
 kernel/irq/matrix.c   | 14 +++---
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 9f148e3d45b4..b7fc290b4b98 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -313,14 +313,12 @@ assign_managed_vector(struct irq_data *irqd, const struct 
cpumask *dest)
struct apic_chip_data *apicd = apic_chip_data(irqd);
int vector, cpu;
 
-   cpumask_and(vector_searchmask, vector_searchmask, affmsk);
-   cpu = cpumask_first(vector_searchmask);
-   if (cpu >= nr_cpu_ids)
-   return -EINVAL;
+   cpumask_and(vector_searchmask, dest, affmsk);
+
/* set_affinity might call here for nothing */
if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
return 0;
-   vector = irq_matrix_alloc_managed(vector_matrix, cpu);
+   vector = irq_matrix_alloc_managed(vector_matrix, vector_searchmask, 
&cpu);
trace_vector_alloc_managed(irqd->irq, vector, vector);
if (vector < 0)
return vector;
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 201de12a9957..c9bffda04a45 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -1151,7 +1151,8 @@ void irq_matrix_offline(struct irq_matrix *m);
 void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool 
replace);
 int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask 
*msk);
 void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask 
*msk);
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu);
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu);
 void irq_matrix_reserve(struct irq_matrix *m);
 void irq_matrix_remove_reserved(struct irq_matrix *m);
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 67768bbe736e..34f97c4f10d7 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -260,11 +260,18 @@ void irq_matrix_remove_managed(struct irq_matrix *m, 
const struct cpumask *msk)
  * @m: Matrix pointer
  * @cpu:   On which CPU the interrupt should be allocated
  */
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu)
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu)
 {
-   struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
-   unsigned int bit, end = m->alloc_end;
+   unsigned int bit, cpu, end = m->alloc_end;
+   struct cpumap *cm;
+
+   cpu = matrix_find_best_cpu(m, msk);
+   if (cpu == UINT_MAX)
+   return -ENOSPC;
 
+   cm = per_cpu_ptr(m->maps, cpu);
+   end = m->alloc_end;
/* Get managed bit which are not allocated */
bitmap_andnot(m->scratch_map, cm->managed_map, cm->alloc_map, end);
bit = find_first_bit(m->scratch_map, end);
@@ -273,6 +280,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned 
int cpu)
set_bit(bit, cm->alloc_map);
cm->allocated++;
m->total_allocated++;
+   *mapped_cpu = cpu;
trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
return bit;
 }
-- 
2.14.3




[PATCH v3 1/2] irq/matrix: Split out the CPU finding code into a helper

2018-09-08 Thread Dou Liyang
From: Dou Liyang 

Linux finds the CPU which has the lowest vector allocation count to spread
out the non managed interrupt across the possible target CPUs.

This common CPU finding code will also be used in managed case,

So Split it out into a helper for preparation.

Signed-off-by: Dou Liyang 
---
Changelog v3 --> v2

 - Make the matrix_find_best_cpu() simple and obvious suggested by tglx
 - Remove the indentation totally suggested by tglx

 kernel/irq/matrix.c | 65 +++--
 1 file changed, 38 insertions(+), 27 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 5092494bf261..67768bbe736e 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -124,6 +124,27 @@ static unsigned int matrix_alloc_area(struct irq_matrix 
*m, struct cpumap *cm,
return area;
 }
 
+/* Find the best CPU which has the lowest vector allocation count */
+static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
+   const struct cpumask *msk)
+{
+   unsigned int cpu, best_cpu, maxavl = 0;
+   struct cpumap *cm;
+
+   best_cpu = UINT_MAX;
+
+   for_each_cpu(cpu, msk) {
+   cm = per_cpu_ptr(m->maps, cpu);
+
+   if (!cm->online || cm->available <= maxavl)
+   continue;
+
+   best_cpu = cpu;
+   maxavl = cm->available;
+   }
+   return best_cpu;
+}
+
 /**
  * irq_matrix_assign_system - Assign system wide entry in the matrix
  * @m: Matrix pointer
@@ -322,37 +343,27 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 bool reserved, unsigned int *mapped_cpu)
 {
-   unsigned int cpu, best_cpu, maxavl = 0;
+   unsigned int cpu, bit;
struct cpumap *cm;
-   unsigned int bit;
 
-   best_cpu = UINT_MAX;
-   for_each_cpu(cpu, msk) {
-   cm = per_cpu_ptr(m->maps, cpu);
-
-   if (!cm->online || cm->available <= maxavl)
-   continue;
+   cpu = matrix_find_best_cpu(m, msk);
+   if (cpu == UINT_MAX)
+   return -ENOSPC;
 
-   best_cpu = cpu;
-   maxavl = cm->available;
-   }
+   cm = per_cpu_ptr(m->maps, cpu);
+   bit = matrix_alloc_area(m, cm, 1, false);
+   if (bit >= m->alloc_end)
+   return -ENOSPC;
+   cm->allocated++;
+   cm->available--;
+   m->total_allocated++;
+   m->global_available--;
+   if (reserved)
+   m->global_reserved--;
+   *mapped_cpu = cpu;
+   trace_irq_matrix_alloc(bit, cpu, m, cm);
+   return bit;
 
-   if (maxavl) {
-   cm = per_cpu_ptr(m->maps, best_cpu);
-   bit = matrix_alloc_area(m, cm, 1, false);
-   if (bit < m->alloc_end) {
-   cm->allocated++;
-   cm->available--;
-   m->total_allocated++;
-   m->global_available--;
-   if (reserved)
-   m->global_reserved--;
-   *mapped_cpu = best_cpu;
-   trace_irq_matrix_alloc(bit, best_cpu, m, cm);
-   return bit;
-   }
-   }
-   return -ENOSPC;
 }
 
 /**
-- 
2.14.3




[PATCH v2 2/2] irq/matrix: Spread managed interrupts on allocation

2018-09-08 Thread Dou Liyang
From: Dou Liyang 

Linux has spread out the non managed interrupt across the possible
target CPUs to avoid vector space exhaustion.

But, the same situation may happen on the managed interrupts.

Spread managed interrupt on allocation as well.

Fixes: a0c9259dc4e1("irq/matrix: Spread interrupts on allocation")
Signed-off-by: Dou Liyang 
---
Changelog v2 --> v1

 -Avoid a churn and the extra indentation suggested by tglx

 arch/x86/kernel/apic/vector.c |  8 +++-
 include/linux/irq.h   |  3 ++-
 kernel/irq/matrix.c   | 14 +++---
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 9f148e3d45b4..b7fc290b4b98 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -313,14 +313,12 @@ assign_managed_vector(struct irq_data *irqd, const struct 
cpumask *dest)
struct apic_chip_data *apicd = apic_chip_data(irqd);
int vector, cpu;
 
-   cpumask_and(vector_searchmask, vector_searchmask, affmsk);
-   cpu = cpumask_first(vector_searchmask);
-   if (cpu >= nr_cpu_ids)
-   return -EINVAL;
+   cpumask_and(vector_searchmask, dest, affmsk);
+
/* set_affinity might call here for nothing */
if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
return 0;
-   vector = irq_matrix_alloc_managed(vector_matrix, cpu);
+   vector = irq_matrix_alloc_managed(vector_matrix, vector_searchmask, 
&cpu);
trace_vector_alloc_managed(irqd->irq, vector, vector);
if (vector < 0)
return vector;
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 201de12a9957..c9bffda04a45 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -1151,7 +1151,8 @@ void irq_matrix_offline(struct irq_matrix *m);
 void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool 
replace);
 int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask 
*msk);
 void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask 
*msk);
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu);
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu);
 void irq_matrix_reserve(struct irq_matrix *m);
 void irq_matrix_remove_reserved(struct irq_matrix *m);
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 0cbcdef9337a..753b8b0ba045 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -259,11 +259,18 @@ void irq_matrix_remove_managed(struct irq_matrix *m, 
const struct cpumask *msk)
  * @m: Matrix pointer
  * @cpu:   On which CPU the interrupt should be allocated
  */
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu)
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu)
 {
-   struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
unsigned int bit, end = m->alloc_end;
+   unsigned int best_cpu = UINT_MAX;
+   struct cpumap *cm;
 
+   if (!matrix_find_best_cpu(m, msk, &best_cpu))
+   return -ENOSPC;
+
+   cm = per_cpu_ptr(m->maps, best_cpu);
+   end = m->alloc_end;
/* Get managed bit which are not allocated */
bitmap_andnot(m->scratch_map, cm->managed_map, cm->alloc_map, end);
bit = find_first_bit(m->scratch_map, end);
@@ -272,7 +279,8 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned 
int cpu)
set_bit(bit, cm->alloc_map);
cm->allocated++;
m->total_allocated++;
-   trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
+   *mapped_cpu = best_cpu;
+   trace_irq_matrix_alloc_managed(bit, best_cpu, m, cm);
return bit;
 }
 
-- 
2.14.3




[PATCH v2 1/2] irq/matrix: Split out the CPU finding code into a helper

2018-09-08 Thread Dou Liyang
From: Dou Liyang 

Linux finds the CPU which has the lowest vector allocation count to spread
out the non managed interrupt across the possible target CPUs.

This common CPU finding code will also be used in managed case,

So Split it out into a helper for preparation.

Signed-off-by: Dou Liyang 
---
Changelog v2 --> v1

 -Avoid a churn and the extra indentation suggested by tglx

 kernel/irq/matrix.c | 60 +++--
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 5092494bf261..0cbcdef9337a 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -124,6 +124,26 @@ static unsigned int matrix_alloc_area(struct irq_matrix 
*m, struct cpumap *cm,
return area;
 }
 
+/* Find the best CPU which has the lowest vector allocation count */
+static int matrix_find_best_cpu(struct irq_matrix *m,
+   const struct cpumask *msk, int *best_cpu)
+{
+   unsigned int cpu, maxavl = 0;
+   struct cpumap *cm;
+
+   for_each_cpu(cpu, msk) {
+   cm = per_cpu_ptr(m->maps, cpu);
+
+   if (!cm->online || cm->available <= maxavl)
+   continue;
+
+   *best_cpu = cpu;
+   maxavl = cm->available;
+   }
+
+   return maxavl;
+}
+
 /**
  * irq_matrix_assign_system - Assign system wide entry in the matrix
  * @m: Matrix pointer
@@ -322,35 +342,25 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 bool reserved, unsigned int *mapped_cpu)
 {
-   unsigned int cpu, best_cpu, maxavl = 0;
+   unsigned int best_cpu = UINT_MAX;
struct cpumap *cm;
unsigned int bit;
 
-   best_cpu = UINT_MAX;
-   for_each_cpu(cpu, msk) {
-   cm = per_cpu_ptr(m->maps, cpu);
-
-   if (!cm->online || cm->available <= maxavl)
-   continue;
-
-   best_cpu = cpu;
-   maxavl = cm->available;
-   }
+   if (!matrix_find_best_cpu(m, msk, &best_cpu))
+   return -ENOSPC;
 
-   if (maxavl) {
-   cm = per_cpu_ptr(m->maps, best_cpu);
-   bit = matrix_alloc_area(m, cm, 1, false);
-   if (bit < m->alloc_end) {
-   cm->allocated++;
-   cm->available--;
-   m->total_allocated++;
-   m->global_available--;
-   if (reserved)
-   m->global_reserved--;
-   *mapped_cpu = best_cpu;
-   trace_irq_matrix_alloc(bit, best_cpu, m, cm);
-   return bit;
-   }
+   cm = per_cpu_ptr(m->maps, best_cpu);
+   bit = matrix_alloc_area(m, cm, 1, false);
+   if (bit < m->alloc_end) {
+   cm->allocated++;
+   cm->available--;
+   m->total_allocated++;
+   m->global_available--;
+   if (reserved)
+   m->global_reserved--;
+   *mapped_cpu = best_cpu;
+   trace_irq_matrix_alloc(bit, best_cpu, m, cm);
+   return bit;
}
return -ENOSPC;
 }
-- 
2.14.3




[PATCH 2/2] irq/matrix: Spread managed interrupts on allocation

2018-09-07 Thread Dou Liyang
From: Dou Liyang 

Linux has spread out the non managed interrupt across the possible
target CPUs to avoid vector space exhaustion.

But, the same situation may happen on the managed interrupts.

Spread managed interrupt on allocation as well.

Fixes: a0c9259dc4e1("irq/matrix: Spread interrupts on allocation")
Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/vector.c |  8 +++-
 include/linux/irq.h   |  3 ++-
 kernel/irq/matrix.c   | 32 
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 9f148e3d45b4..b7fc290b4b98 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -313,14 +313,12 @@ assign_managed_vector(struct irq_data *irqd, const struct 
cpumask *dest)
struct apic_chip_data *apicd = apic_chip_data(irqd);
int vector, cpu;
 
-   cpumask_and(vector_searchmask, vector_searchmask, affmsk);
-   cpu = cpumask_first(vector_searchmask);
-   if (cpu >= nr_cpu_ids)
-   return -EINVAL;
+   cpumask_and(vector_searchmask, dest, affmsk);
+
/* set_affinity might call here for nothing */
if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
return 0;
-   vector = irq_matrix_alloc_managed(vector_matrix, cpu);
+   vector = irq_matrix_alloc_managed(vector_matrix, vector_searchmask, 
&cpu);
trace_vector_alloc_managed(irqd->irq, vector, vector);
if (vector < 0)
return vector;
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 201de12a9957..c9bffda04a45 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -1151,7 +1151,8 @@ void irq_matrix_offline(struct irq_matrix *m);
 void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool 
replace);
 int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask 
*msk);
 void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask 
*msk);
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu);
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu);
 void irq_matrix_reserve(struct irq_matrix *m);
 void irq_matrix_remove_reserved(struct irq_matrix *m);
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 5eb0c8b857f0..b449a749b354 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -259,21 +259,29 @@ void irq_matrix_remove_managed(struct irq_matrix *m, 
const struct cpumask *msk)
  * @m: Matrix pointer
  * @cpu:   On which CPU the interrupt should be allocated
  */
-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu)
+int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+   unsigned int *mapped_cpu)
 {
-   struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
unsigned int bit, end = m->alloc_end;
+   unsigned int best_cpu = UINT_MAX;
+   struct cpumap *cm;
 
-   /* Get managed bit which are not allocated */
-   bitmap_andnot(m->scratch_map, cm->managed_map, cm->alloc_map, end);
-   bit = find_first_bit(m->scratch_map, end);
-   if (bit >= end)
-   return -ENOSPC;
-   set_bit(bit, cm->alloc_map);
-   cm->allocated++;
-   m->total_allocated++;
-   trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
-   return bit;
+   if (matrix_find_best_cpu(m, msk, &best_cpu)) {
+   cm = per_cpu_ptr(m->maps, best_cpu);
+   end = m->alloc_end;
+   /* Get managed bit which are not allocated */
+   bitmap_andnot(m->scratch_map, cm->managed_map, cm->alloc_map, 
end);
+   bit = find_first_bit(m->scratch_map, end);
+   if (bit >= end)
+   return -ENOSPC;
+   set_bit(bit, cm->alloc_map);
+   cm->allocated++;
+   m->total_allocated++;
+   *mapped_cpu = best_cpu;
+   trace_irq_matrix_alloc_managed(bit, best_cpu, m, cm);
+   return bit;
+   }
+   return -ENOSPC;
 }
 
 /**
-- 
2.14.3




[PATCH 1/2] irq/matrix: Split out the CPU finding code into a helper

2018-09-07 Thread Dou Liyang
From: Dou Liyang 

Linux finds the CPU which has the lowest vector allocation count to spread
out the non managed interrupt across the possible target CPUs.

This common CPU finding code will also be used in managed case, 

So Split it out into a helper for preparation.

Signed-off-by: Dou Liyang 
---
 kernel/irq/matrix.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 5092494bf261..5eb0c8b857f0 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -124,6 +124,26 @@ static unsigned int matrix_alloc_area(struct irq_matrix 
*m, struct cpumap *cm,
return area;
 }
 
+/* Find the best CPU which has the lowest vector allocation count */
+static int matrix_find_best_cpu(struct irq_matrix *m,
+   const struct cpumask *msk, int *best_cpu)
+{
+   unsigned int cpu, maxavl = 0;
+   struct cpumap *cm;
+
+   for_each_cpu(cpu, msk) {
+   cm = per_cpu_ptr(m->maps, cpu);
+
+   if (!cm->online || cm->available <= maxavl)
+   continue;
+
+   *best_cpu = cpu;
+   maxavl = cm->available;
+   }
+
+   return maxavl;
+}
+
 /**
  * irq_matrix_assign_system - Assign system wide entry in the matrix
  * @m: Matrix pointer
@@ -322,22 +342,11 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 bool reserved, unsigned int *mapped_cpu)
 {
-   unsigned int cpu, best_cpu, maxavl = 0;
+   unsigned int best_cpu = UINT_MAX;
struct cpumap *cm;
unsigned int bit;
 
-   best_cpu = UINT_MAX;
-   for_each_cpu(cpu, msk) {
-   cm = per_cpu_ptr(m->maps, cpu);
-
-   if (!cm->online || cm->available <= maxavl)
-   continue;
-
-   best_cpu = cpu;
-   maxavl = cm->available;
-   }
-
-   if (maxavl) {
+   if (matrix_find_best_cpu(m, msk, &best_cpu)) {
cm = per_cpu_ptr(m->maps, best_cpu);
bit = matrix_alloc_area(m, cm, 1, false);
if (bit < m->alloc_end) {
-- 
2.14.3




[PATCH v3] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-08-23 Thread Dou Liyang
ACPI driver should make sure all the processor IDs in their ACPI Namespace
are unique. the driver performs a depth-first walk of the namespace tree
and calls the acpi_processor_ids_walk() to check the duplicate IDs.

But, the acpi_processor_ids_walk() mistakes the return value. If a
processor is checked, it returns true which causes the walk break
immediately, and other processors will never be checked.

Repace the value with AE_OK which is the standard acpi_status value.
And don't abort the namespace walk even on error.

Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for processor 
enumeration")
Signed-off-by: Dou Liyang 
---
Changelog:
  v2 --> v3:
   - Fix a compiler error reported by LKP
  v1 --> v2:
   - Fix the check against duplicate IDs suggested by Rafael.
  
  Now,the duplicate IDs only be found in Ivb42 machine, and we have added this 
check at 
  linux-4.9. But, we introduced a bug in linux-4.12 by commit 8c8cb30f49b8.

  For resolving the bug, firstly, I removed the check[1]. because Linux will 
compare
  the coming ID with present processors when it hot-added a physical CPU and 
will avoid
  using duplicate IDs.

  But, seems we should consider all the possible processors. So, with this 
patch, All
  the processors with the same IDs will never be hot-plugged.

[1] https://lkml.org/lkml/2018/5/28/213
---
 drivers/acpi/acpi_processor.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..fc447410ae4d 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -643,7 +643,7 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
 
status = acpi_get_type(handle, &acpi_type);
if (ACPI_FAILURE(status))
-   return false;
+   return status;
 
switch (acpi_type) {
case ACPI_TYPE_PROCESSOR:
@@ -663,11 +663,12 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
}
 
processor_validated_ids_update(uid);
-   return true;
+   return AE_OK;
 
 err:
+   /* Exit on error, but don't abort the namespace walk */
acpi_handle_info(handle, "Invalid processor object\n");
-   return false;
+   return AE_OK;
 
 }
 
-- 
2.14.3





[RESEND PATCH v2] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-08-22 Thread Dou Liyang
ACPI driver should make sure all the processor IDs in their ACPI Namespace
are unique. the driver performs a depth-first walk of the namespace tree
and calls the acpi_processor_ids_walk() to check the duplicate IDs.

But, the acpi_processor_ids_walk() mistakes the return value. If a
processor is checked, it returns true which causes the walk break
immediately, and other processors will never be checked.

Repace the value with AE_OK which is the standard acpi_status value.
And don't abort the namespace walk even on error.

Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for processor 
enumeration")
Signed-off-by: Dou Liyang 
---
Changelog:
  v1 --> v2:
   - Fix the check against duplicate IDs suggested by Rafael.
  
  Now,the duplicate IDs only be found in Ivb42 machine, and we have added this 
check at 
  linux-4.9. But, we introduced a bug in linux-4.12 by commit 8c8cb30f49b8.

  For resolving the bug, firstly, I removed the check[1]. because Linux will 
compare
  the coming ID with present processors when it hot-added a physical CPU and 
will avoid
  using duplicate IDs.

  But, seems we should consider all the possible processors. So, with this 
patch, All
  the processors with the same IDs will never be hot-plugged.

[1] https://lkml.org/lkml/2018/5/28/213
---
 drivers/acpi/acpi_processor.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..a59870ccd5ca 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -643,7 +643,7 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
 
status = acpi_get_type(handle, &acpi_type);
if (ACPI_FAILURE(status))
-   return false;
+   return_ACPI_STATUS(status);
 
switch (acpi_type) {
case ACPI_TYPE_PROCESSOR:
@@ -663,11 +663,12 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
}
 
processor_validated_ids_update(uid);
-   return true;
+   return AE_OK;
 
 err:
+   /* Exit on error, but don't abort the namespace walk */
acpi_handle_info(handle, "Invalid processor object\n");
-   return false;
+   return AE_OK;
 
 }
 
-- 
2.14.3





[PATCH] sched/core: Fix compiling warring in smp=n case

2018-08-09 Thread Dou Liyang
When compiling kernel with SMP disabled, the build warns with:

kernel/sched/core.c: In function ‘update_rq_clock_task’:
kernel/sched/core.c:139:17: warning: unused variable ‘irq_delta’ 
[-Wunused-variable]
  s64 steal = 0, irq_delta = 0;

Fix this by revert the HAVE_SCHED_AVG_IRQ to

  defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)

Fixes: 2e62c4743adc ("sched/fair: Remove #ifdefs from scale_rt_capacity()")
Signed-off-by: Dou Liyang 
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c45de46fdf10..ef954d96c80c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -177,7 +177,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 
rq->clock_task += delta;
 
-#ifdef HAVE_SCHED_AVG_IRQ
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
update_irq_load_avg(rq, irq_delta + steal);
 #endif
-- 
2.14.3





Re: [PATCH v4 2/4] x86/boot: Add acpitb.c to parse acpi tables

2018-08-02 Thread Dou Liyang




At 07/23/2018 05:29 PM, Chao Fan wrote:

Imitate the ACPI code to parse ACPI tables. Functions are simplified
cause some operations are not needed here.
And also, this method won't influence the initialization of ACPI.

Signed-off-by: Chao Fan 


Hi Fan,

I know you got the code from acpica subsystem and EFI code... and do
many adaptation work for KASLR. It's awesome!

I think you can add some other simple comments.

  - what differences between your function and the function you based on
and why did you do that?

... to make this more credible and easy to remember the details as time
goes on.

Also some concerns below.

---

[...]

+   else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4))
+   efi_64 = false;
+   else {
+   debug_putstr("Wrong efi loader signature.\n");


s/efi/EFI/, also need fix in the comments below.


+   return false;
+   }
+

[...]

+   /*
+* Get rsdp from efi tables.
+* If we find acpi table, go on searching for acpi20 table.
+* If we didn't get acpi20 table then use acpi table.
+* If neither acpi table nor acpi20 table found,
+* return false.
+*/
+   if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)) && !acpi_20) {
+   *rsdp_addr = (acpi_physical_address)table;
+   acpi_20 = false;
+   find_rsdp = true;
+   } else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID))) {
+   *rsdp_addr = (acpi_physical_address)table;
+   acpi_20 = true;
+   return true;


If we find the ACPI 2.0, we will return immediately, so the variable and
logic of _acpi_20_ is redundant.

Thanks,
dou




Re: [PATCH v4 3/4] x86/boot/KASLR: Walk srat tables to filter immovable memory

2018-08-02 Thread Dou Liyang




At 08/02/2018 03:05 PM, Thomas Gleixner wrote:
[...]


Folks. Can you please both stop this annoying habit of keeping the full
context of the mail and then sprinkling a random sentence into the middle?



I see. won’t do this stupid thing again.

Thanks,

dou




Re: [PATCH v4 4/4] x86/boot/KASLR: Limit kaslr to choosing the immovable memory

2018-08-01 Thread Dou Liyang




At 08/02/2018 02:00 PM, Chao Fan wrote:

On Thu, Aug 02, 2018 at 01:46:29PM +0800, Dou Liyang wrote:

Hi Fan,

At 07/23/2018 05:29 PM, Chao Fan wrote:

If 'CONFIG_MEMORY_HOTREMOVE' specified and the account of immovable
memory regions is not zero. Calculate the intersection between memory
regions from e820/efi memory table and immovable memory regions.
Or go on the old code.

Rename process_mem_region to slots_count to match slots_fetch_random,
and name new function as process_mem_region.

Signed-off-by: Chao Fan 
---
   arch/x86/boot/compressed/kaslr.c | 66 ++--
   1 file changed, 55 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 4705682caf1f..10bda3a1fcaa 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -631,9 +631,9 @@ static unsigned long slots_fetch_random(void)
return 0;
   }
-static void process_mem_region(struct mem_vector *entry,
-  unsigned long minimum,
-  unsigned long image_size)
+static void slots_count(struct mem_vector *entry,

   ^^^
  is not suitable.
IMO, how about process_mem_slots() or you may have a better name, it's
up to you.


It's from Kees Cook's advise, he wants to ues slots_count() to match
slots_fetch_random() in my old PATCH long long ago.
Since the method of handling this part is not changed a lot, so I keep
this name.



Okay! ;-)


dou




Re: [PATCH v4 4/4] x86/boot/KASLR: Limit kaslr to choosing the immovable memory

2018-08-01 Thread Dou Liyang

Hi Fan,

At 07/23/2018 05:29 PM, Chao Fan wrote:

If 'CONFIG_MEMORY_HOTREMOVE' specified and the account of immovable
memory regions is not zero. Calculate the intersection between memory
regions from e820/efi memory table and immovable memory regions.
Or go on the old code.

Rename process_mem_region to slots_count to match slots_fetch_random,
and name new function as process_mem_region.

Signed-off-by: Chao Fan 
---
  arch/x86/boot/compressed/kaslr.c | 66 ++--
  1 file changed, 55 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 4705682caf1f..10bda3a1fcaa 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -631,9 +631,9 @@ static unsigned long slots_fetch_random(void)
return 0;
  }
  
-static void process_mem_region(struct mem_vector *entry,

-  unsigned long minimum,
-  unsigned long image_size)
+static void slots_count(struct mem_vector *entry,

   ^^^
  is not suitable.
IMO, how about process_mem_slots() or you may have a better name, it's
up to you.


+   unsigned long minimum,
+   unsigned long image_size)
  {
struct mem_vector region, overlap;
struct slot_area slot_area;


slot_area is also unused.

Thanks,
dou





Re: [PATCH v4 3/4] x86/boot/KASLR: Walk srat tables to filter immovable memory

2018-08-01 Thread Dou Liyang

Hi Fan,

At 07/23/2018 05:29 PM, Chao Fan wrote:

If 'CONFIG_MEMORY_HOTREMOVE' specified, walk the acpi srat memory
tables, store the immovable memory regions, so that kaslr can get
the information abouth where can be selected or not.
If 'CONFIG_MEMORY_HOTREMOVE' not specified, go on the old code.

Signed-off-by: Chao Fan 
---
  arch/x86/boot/compressed/kaslr.c | 55 
  1 file changed, 55 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 531c9876f573..4705682caf1f 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -31,6 +31,7 @@
  
  #include "misc.h"

  #include "error.h"
+#include "acpitb.h"
  #include "../string.h"
  
  #include 

@@ -104,6 +105,14 @@ static bool memmap_too_large;
  /* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
  unsigned long long mem_limit = ULLONG_MAX;
  
+#ifdef CONFIG_MEMORY_HOTREMOVE

+/* Store the immovable memory regions */
+struct mem_vector immovable_mem[];
+
+/* Store the amount of immovable memory regions */
+static int num_immovable_mem;
+#endif
+
  
  enum mem_avoid_index {

MEM_AVOID_ZO_RANGE = 0,
@@ -298,6 +307,47 @@ static int handle_mem_options(void)
return 0;
  }
  
+#ifdef CONFIG_MEMORY_HOTREMOVE

+/*
+ * According to ACPI table, filter the immvoable memory regions
+ * and store them in immovable_mem[].
+ */
+static void handle_immovable_mem(void)
+{
+   struct acpi_table_header *table_header;
+   struct acpi_subtable_header *table;
+   struct acpi_srat_mem_affinity *ma;
+   unsigned long table_size;
+   unsigned long table_end;
+   int i = 0;
+
+   table_header = get_acpi_srat_table();
+
+   if (!table_header)
+   return;
+
+   table_end = (unsigned long)table_header + table_header->length;
+
+   table = (struct acpi_subtable_header *)
+   ((unsigned long)table_header + sizeof(struct acpi_table_srat));
+
+   table_size = sizeof(struct acpi_subtable_header);


table_size isn't used, can be remove.


+   while (((unsigned long)table) + table->length < table_end) {
+   if (table->type == 1) {
+   ma = (struct acpi_srat_mem_affinity *)table;
+   if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+   immovable_mem[i].start = ma->base_address;
+   immovable_mem[i].size = ma->length;
+   i++;


need a check(i < MAX_NUMNODES*2) before doing that in case of
__IndexOutOfBoundsException__ even if it may be impossible in ACPI.

Thanks,
dou

+   }
+   }
+   table = (struct acpi_subtable_header *)
+   ((unsigned long)table + table->length);
+   }
+   num_immovable_mem = i;
+}
+#endif
+
  /*
   * In theory, KASLR can put the kernel anywhere in the range of [16M, 64T).
   * The mem_avoid array is used to store the ranges that need to be avoided
@@ -421,6 +471,11 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
/* Mark the memmap regions we need to avoid */
handle_mem_options();
  
+#ifdef CONFIG_MEMORY_HOTREMOVE

+   /* Mark the immovable regions we need to choose */
+   handle_immovable_mem();
+#endif
+
  #ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
add_identity_map(0, PMD_SIZE);






[tip:x86/platform] x86/platform/UV: Mark memblock related init code and data correctly

2018-07-30 Thread tip-bot for Dou Liyang
Commit-ID:  24cfd8ca1d28331b9dad3b88d1958c976b2cfab6
Gitweb: https://git.kernel.org/tip/24cfd8ca1d28331b9dad3b88d1958c976b2cfab6
Author: Dou Liyang 
AuthorDate: Mon, 30 Jul 2018 15:59:47 +0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 30 Jul 2018 19:53:58 +0200

x86/platform/UV: Mark memblock related init code and data correctly

parse_mem_block_size() and mem_block_size are only used during init. Mark
them accordingly.

Fixes: d7609f4210cb ("x86/platform/UV: Add kernel parameter to set memory block 
size")
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: h...@zytor.com
Cc: Mike Travis 
Cc: Andrew Banman 
Link: https://lkml.kernel.org/r/20180730075947.23023-1-douly.f...@cn.fujitsu.com

---
 arch/x86/kernel/apic/x2apic_uv_x.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c 
b/arch/x86/kernel/apic/x2apic_uv_x.c
index d492752f79e1..391f358ebb4c 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -394,10 +394,10 @@ extern int uv_hub_info_version(void)
 EXPORT_SYMBOL(uv_hub_info_version);
 
 /* Default UV memory block size is 2GB */
-static unsigned long mem_block_size = (2UL << 30);
+static unsigned long mem_block_size __initdata = (2UL << 30);
 
 /* Kernel parameter to specify UV mem block size */
-static int parse_mem_block_size(char *ptr)
+static int __init parse_mem_block_size(char *ptr)
 {
unsigned long size = memparse(ptr, NULL);
 


[tip:x86/timers] x86/tsc: Consolidate init code

2018-07-30 Thread tip-bot for Dou Liyang
Commit-ID:  608008a45798fe9e2aee04f99b5270ea57c1376f
Gitweb: https://git.kernel.org/tip/608008a45798fe9e2aee04f99b5270ea57c1376f
Author: Dou Liyang 
AuthorDate: Mon, 30 Jul 2018 15:54:20 +0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 30 Jul 2018 19:33:35 +0200

x86/tsc: Consolidate init code

Split out suplicated code from tsc_early_init() and tsc_init() into a
common helper and fixup some comment typos.

[ tglx: Massaged changelog and renamed function ]

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Pavel Tatashin 
Cc: 
Cc: Peter Zijlstra 
Cc: "H. Peter Anvin" 
Link: https://lkml.kernel.org/r/20180730075421.22830-2-douly.f...@cn.fujitsu.com

---
 arch/x86/kernel/tsc.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 02e416b87ac1..1463468ba9a0 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -182,7 +182,7 @@ static void __init cyc2ns_init_boot_cpu(void)
 }
 
 /*
- * Secondary CPUs do not run through cyc2ns_init(), so set up
+ * Secondary CPUs do not run through tsc_init(), so set up
  * all the scale factors for all CPUs, assuming the same
  * speed as the bootup CPU. (cpufreq notifiers will fix this
  * up if their speed diverges)
@@ -1389,7 +1389,7 @@ static bool __init determine_cpu_tsc_frequencies(bool 
early)
}
 
/*
-* Trust non-zero tsc_khz as authorative,
+* Trust non-zero tsc_khz as authoritative,
 * and use it to sanity check cpu_khz,
 * which will be off if system timer is off.
 */
@@ -1421,6 +1421,14 @@ static unsigned long __init get_loops_per_jiffy(void)
return lpj;
 }
 
+static void __init tsc_enable_sched_clock(void)
+{
+   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+   tsc_store_and_check_tsc_adjust(true);
+   cyc2ns_init_boot_cpu();
+   static_branch_enable(&__use_tsc);
+}
+
 void __init tsc_early_init(void)
 {
if (!boot_cpu_has(X86_FEATURE_TSC))
@@ -1429,10 +1437,7 @@ void __init tsc_early_init(void)
return;
loops_per_jiffy = get_loops_per_jiffy();
 
-   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
-   tsc_store_and_check_tsc_adjust(true);
-   cyc2ns_init_boot_cpu();
-   static_branch_enable(&__use_tsc);
+   tsc_enable_sched_clock();
 }
 
 void __init tsc_init(void)
@@ -1456,13 +1461,10 @@ void __init tsc_init(void)
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
}
-   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
-   tsc_store_and_check_tsc_adjust(true);
-   cyc2ns_init_boot_cpu();
+   tsc_enable_sched_clock();
}
 
cyc2ns_init_secondary_cpus();
-   static_branch_enable(&__use_tsc);
 
if (!no_sched_irq_time)
enable_sched_clock_irqtime();


[tip:x86/timers] x86/kvmclock: Mark kvm_get_preset_lpj() as __init

2018-07-30 Thread tip-bot for Dou Liyang
Commit-ID:  1088c6eef261939bda8346ec35b513790a2111d5
Gitweb: https://git.kernel.org/tip/1088c6eef261939bda8346ec35b513790a2111d5
Author: Dou Liyang 
AuthorDate: Mon, 30 Jul 2018 15:54:21 +0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 30 Jul 2018 19:33:35 +0200

x86/kvmclock: Mark kvm_get_preset_lpj() as __init

kvm_get_preset_lpj() is only called from kvmclock_init(), so mark it __init
as well.

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Pavel Tatashin 
Cc: 
Cc: Paolo Bonzini 
Cc:   Radim Krčmář
Cc:  
Cc: "H. Peter Anvin" 
Cc: "Radim Krčmář" 
Cc: k...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730075421.22830-3-douly.f...@cn.fujitsu.com

---
 arch/x86/kernel/kvmclock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 91b94c0ae4e3..d2edd7e6c294 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -145,7 +145,7 @@ static unsigned long kvm_get_tsc_khz(void)
return pvclock_tsc_khz(this_cpu_pvti());
 }
 
-static void kvm_get_preset_lpj(void)
+static void __init kvm_get_preset_lpj(void)
 {
unsigned long khz;
u64 lpj;


Re: [PATCH 1/2] x86/tsc: Avoid a double call if TSC initializes earlier.

2018-07-30 Thread Dou Liyang

Hi Peter,

At 07/30/2018 05:34 PM, Peter Zijlstra wrote:

On Mon, Jul 30, 2018 at 03:54:20PM +0800, Dou Liyang wrote:

static_branch_enable(&__use_tsc) may be called twice in common case, that
is redundant.


It is also really not a problem...



Yes, right. Just avoid the second useless setting.

Thanks,
dou









[PATCH] x86/apic: Mark parse_mem_block_size as __init

2018-07-30 Thread Dou Liyang
The early_param() is only called during kernel initialization, So Linux
marks the functions of it with __init macro to save memory.

But it forgot to mark parse_mem_block_size(). So, Make it __init as well.
Aslo mark mem_block_size variable __initdata.

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/x2apic_uv_x.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c 
b/arch/x86/kernel/apic/x2apic_uv_x.c
index d492752f79e1..391f358ebb4c 100644
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -394,10 +394,10 @@ extern int uv_hub_info_version(void)
 EXPORT_SYMBOL(uv_hub_info_version);
 
 /* Default UV memory block size is 2GB */
-static unsigned long mem_block_size = (2UL << 30);
+static unsigned long mem_block_size __initdata = (2UL << 30);
 
 /* Kernel parameter to specify UV mem block size */
-static int parse_mem_block_size(char *ptr)
+static int __init parse_mem_block_size(char *ptr)
 {
unsigned long size = memparse(ptr, NULL);
 
-- 
2.14.3





[PATCH 1/2] x86/tsc: Avoid a double call if TSC initializes earlier.

2018-07-30 Thread Dou Liyang
static_branch_enable(&__use_tsc) may be called twice in common case, that
is redundant. And there are also some common functions both in
tsc_early_init() and tsc_init().

Move them into a separate helper function, only call it once.
Also fix comments:
 -s/authorative/authoritative
 -s/cyc2ns_init/tsc_init

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Peter Zijlstra 
Cc: Pavel Tatashin 
Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/tsc.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 02e416b87ac1..ba0a6b6e0726 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -182,7 +182,7 @@ static void __init cyc2ns_init_boot_cpu(void)
 }
 
 /*
- * Secondary CPUs do not run through cyc2ns_init(), so set up
+ * Secondary CPUs do not run through tsc_init(), so set up
  * all the scale factors for all CPUs, assuming the same
  * speed as the bootup CPU. (cpufreq notifiers will fix this
  * up if their speed diverges)
@@ -1389,7 +1389,7 @@ static bool __init determine_cpu_tsc_frequencies(bool 
early)
}
 
/*
-* Trust non-zero tsc_khz as authorative,
+* Trust non-zero tsc_khz as authoritative,
 * and use it to sanity check cpu_khz,
 * which will be off if system timer is off.
 */
@@ -1421,6 +1421,14 @@ static unsigned long __init get_loops_per_jiffy(void)
return lpj;
 }
 
+static void __init enable_use_tsc(void)
+{
+   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+   tsc_store_and_check_tsc_adjust(true);
+   cyc2ns_init_boot_cpu();
+   static_branch_enable(&__use_tsc);
+}
+
 void __init tsc_early_init(void)
 {
if (!boot_cpu_has(X86_FEATURE_TSC))
@@ -1429,10 +1437,7 @@ void __init tsc_early_init(void)
return;
loops_per_jiffy = get_loops_per_jiffy();
 
-   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
-   tsc_store_and_check_tsc_adjust(true);
-   cyc2ns_init_boot_cpu();
-   static_branch_enable(&__use_tsc);
+   enable_use_tsc();
 }
 
 void __init tsc_init(void)
@@ -1456,13 +1461,10 @@ void __init tsc_init(void)
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
}
-   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
-   tsc_store_and_check_tsc_adjust(true);
-   cyc2ns_init_boot_cpu();
+   enable_use_tsc();
}
 
cyc2ns_init_secondary_cpus();
-   static_branch_enable(&__use_tsc);
 
if (!no_sched_irq_time)
enable_sched_clock_irqtime();
-- 
2.14.3





[PATCH 2/2] x86/kvmclock: Mark kvm_get_preset_lpj() as __init

2018-07-30 Thread Dou Liyang
kvm_get_preset_lpj() just be called at kvmclock_init(), So mark it
 __init as well.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: k...@vger.kernel.org
Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/kvmclock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 91b94c0ae4e3..d2edd7e6c294 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -145,7 +145,7 @@ static unsigned long kvm_get_tsc_khz(void)
return pvclock_tsc_khz(this_cpu_pvti());
 }
 
-static void kvm_get_preset_lpj(void)
+static void __init kvm_get_preset_lpj(void)
 {
unsigned long khz;
u64 lpj;
-- 
2.14.3





[PATCH 0/2] Two cleanup patches

2018-07-30 Thread Dou Liyang
Here are two cleanup patches, When I test the early boot time stamps
with tip tree today.

Dou Liyang (2):
  x86/tsc: Avoid a double call if tsc can initialize earlier.
  x86/kvmclock: Mark kvm_get_preset_lpj() as __init

 arch/x86/kernel/kvmclock.c |  2 +-
 arch/x86/kernel/tsc.c  | 22 --
 2 files changed, 13 insertions(+), 11 deletions(-)

-- 
2.14.3





Re: [PATCH v14 20/25] x86/tsc: calibrate tsc only once

2018-07-18 Thread Dou Liyang

Hi Thomas,

At 07/19/2018 02:25 PM, Thomas Gleixner wrote:

On Thu, 19 Jul 2018, Dou Liyang wrote:

At 07/18/2018 10:22 AM, Pavel Tatashin wrote:

+   (unsigned long)cpu_khz % KHZ);
if (cpu_khz != tsc_khz) {
pr_info("Detected %lu.%03lu MHz TSC",
-   (unsigned long)tsc_khz / 1000,
-   (unsigned long)tsc_khz % 1000);
+   (unsigned long)tsc_khz / KHZ,
+   (unsigned long)tsc_khz % KHZ);
+   }


this curly brackets can be removed


No. They want to stay, really.

 https://lkml.kernel.org/r/alpine.DEB.2.20.1701171956290.3645@nanos

The pr_info() is a multiline statement due to the line breaks.



I see, I’ll keep that in mind.

Thanks,

dou




Re: [PATCH v14 20/25] x86/tsc: calibrate tsc only once

2018-07-18 Thread Dou Liyang

Hi, Pavel

I am sorry, I didn't point out typo clearly in the previous version.
Please see the concerns below. ;-)

At 07/18/2018 10:22 AM, Pavel Tatashin wrote:

During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
and the second time in tsc_init().

Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
the calibration is done only early, and make tsc_init() to use the values
already determined in tsc_early_init().

Sometimes it is not possible to determine tsc early, as the subsystem that
is required is not yet initialized, in such case try again later in
tsc_init().

Suggested-by: Thomas Gleixner 
Signed-off-by: Pavel Tatashin 
---
  arch/x86/include/asm/tsc.h |  2 +-
  arch/x86/kernel/setup.c|  2 +-
  arch/x86/kernel/tsc.c  | 86 --
  3 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 2701d221583a..c4368ff73652 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -33,7 +33,7 @@ static inline cycles_t get_cycles(void)
  extern struct system_counterval_t convert_art_to_tsc(u64 art);
  extern struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns);
  
-extern void tsc_early_delay_calibrate(void);

+extern void tsc_early_init(void);
  extern void tsc_init(void);
  extern void mark_tsc_unstable(char *reason);
  extern int unsynchronized_tsc(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 7490de925a81..5d32c55aeb8b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1014,6 +1014,7 @@ void __init setup_arch(char **cmdline_p)
 */
init_hypervisor_platform();
  
+	tsc_early_init();

x86_init.resources.probe_roms();
  
  	/* after parse_early_param, so could debug it */

@@ -1199,7 +1200,6 @@ void __init setup_arch(char **cmdline_p)
  
  	memblock_find_dma_reserve();
  
-	tsc_early_delay_calibrate();

if (!early_xdbc_setup_hardware())
early_xdbc_register_console();
  
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c

index 186395041725..bc8eb82050a3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -33,6 +33,8 @@ EXPORT_SYMBOL(cpu_khz);
  unsigned int __read_mostly tsc_khz;
  EXPORT_SYMBOL(tsc_khz);
  
+#define KHZ	1000

+
  /*
   * TSC can be unstable due to cpufreq or due to unsynced TSCs
   */
@@ -1335,34 +1337,10 @@ static int __init init_tsc_clocksource(void)
   */
  device_initcall(init_tsc_clocksource);
  
-void __init tsc_early_delay_calibrate(void)

-{
-   unsigned long lpj;
-
-   if (!boot_cpu_has(X86_FEATURE_TSC))
-   return;
-
-   cpu_khz = x86_platform.calibrate_cpu();
-   tsc_khz = x86_platform.calibrate_tsc();
-
-   tsc_khz = tsc_khz ? : cpu_khz;
-   if (!tsc_khz)
-   return;
-
-   lpj = tsc_khz * 1000;
-   do_div(lpj, HZ);
-   loops_per_jiffy = lpj;
-}
-
-void __init tsc_init(void)
+static bool determine_cpu_tsc_frequncies(void)


this function need to be mark as __init.

And a typo here: frequency, s/frequncies/frequencies


  {
-   u64 lpj, cyc;
-   int cpu;
-
-   if (!boot_cpu_has(X86_FEATURE_TSC)) {
-   setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-   return;
-   }
+   /* Make sure that cpu and tsc are not already calibrated */
+   WARN_ON(cpu_khz || tsc_khz);
  
  	cpu_khz = x86_platform.calibrate_cpu();

tsc_khz = x86_platform.calibrate_tsc();
@@ -1377,20 +1355,51 @@ void __init tsc_init(void)
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;
  
-	if (!tsc_khz) {

-   mark_tsc_unstable("could not calculate TSC khz");
-   setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-   return;
-   }
+   if (tsc_khz == 0)
+   return false;
  
  	pr_info("Detected %lu.%03lu MHz processor\n",

-   (unsigned long)cpu_khz / 1000,
-   (unsigned long)cpu_khz % 1000);
+   (unsigned long)cpu_khz / KHZ,
+   (unsigned long)cpu_khz % KHZ);
  
  	if (cpu_khz != tsc_khz) {

pr_info("Detected %lu.%03lu MHz TSC",
-   (unsigned long)tsc_khz / 1000,
-   (unsigned long)tsc_khz % 1000);
+   (unsigned long)tsc_khz / KHZ,
+   (unsigned long)tsc_khz % KHZ);
+   }


this curly brackets can be removed


+   return true;
+}
+
+static unsigned long get_loops_per_jiffy(void)


mark as __init as well.


+{
+   unsigned long lpj = tsc_khz * KHZ;
+
+   do_div(lpj, HZ);
+   return lpj;
+}
+
+void __init tsc_early_init(void)
+{
+   if (!boot_cpu_has(X86_FEATURE_TSC))
+   return;
+   if (!determine_cpu_tsc_frequncies())
+   return;
+   loops_per_jiffy = get_loops_per_jiffy();
+}
+
+void __init tsc_init(void)
+{
+ 

Re: [PATCH v13 13/18] x86/tsc: calibrate tsc only once

2018-07-17 Thread Dou Liyang




At 07/16/2018 09:35 PM, Pavel Tatashin wrote:
[...]

v13 has xen patches, so xen timestamps work early as well now.



Oops, yes, my mistake. I will test the patchset with Thomas's kvm patch
for you.

Thanks,
dou


Thank you,
Pavel








Re: [PATCH v13 13/18] x86/tsc: calibrate tsc only once

2018-07-16 Thread Dou Liyang




At 07/13/2018 07:30 PM, Pavel Tatashin wrote:

On Fri, Jul 13, 2018 at 3:24 AM Dou Liyang  wrote:



At 07/12/2018 08:04 AM, Pavel Tatashin wrote:

During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
and the second time in tsc_init().

Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
the calibration is done only early, and make tsc_init() to use the values
already determined in tsc_early_init().

Sometimes it is not possible to determine tsc early, as the subsystem that
is required is not yet initialized, in such case try again later in
tsc_init().

Suggested-by: Thomas Gleixner 
Signed-off-by: Pavel Tatashin 


Hi Pavel,

Aha, a complex solution for a simple problem! ;-) And I did find any
benefits of doing that. did I miss something?


Hi Dou,

I had this in previous version:  init early, and unconditionally
re-init later (which required to resync sched clocks for continuity,
and check for some other corner cases). Thomas did not like the idea,
as it is less deterministic: it leads for code to work by accident,
where we might get one tsc frequency early and another later, and so
on. The request was to initialize only once, and if that fails do it
again later. This way, if early initialization is broken, we will know
and fix it.



Hi Pavel,

Yes, right, I have seen the purpose in v12.



As the cpu_khz and tsc_khz are global variables and the tsc_khz may
be reset to cpu_khz. How about the following patch.


Could you please explain where you think this patch can be applied,
and what it fixes?



This patch is just an simple alternative to realize what you want in
your patch. your patch is also good but complex, and need some scrub.
eg:
  - Is it suitable to using the WARN_ON()
  - the name of determine_cpu_tsc_frequncies() function,
   s/frequncies/frequencies/. BTW, How about tsc_calibrate_frequencies()
  - ...

BTW, before this patch, seems we need make sure xen should work well, I
can investigate and try to test if we can also move the pagetable_init()
to the front of tsc_early_init() for you.

Thanks,
dou

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index da1dbd99cb6e..74cb16d89e25 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1197,12 +1197,12 @@ void __init setup_arch(char **cmdline_p)

memblock_find_dma_reserve();

+   x86_init.paging.pagetable_init();
+
tsc_early_delay_calibrate();
if (!early_xdbc_setup_hardware())
early_xdbc_register_console();

-   x86_init.paging.pagetable_init();
-
kasan_init();

/*


Thank you,
Pavel



Thanks,
 dou
8<---

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 74392d9d51e0..e54fa1037d45 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1370,8 +1370,10 @@ void __init tsc_init(void)
  return;
  }

-   cpu_khz = x86_platform.calibrate_cpu();
-   tsc_khz = x86_platform.calibrate_tsc();
+   if (!tsc_khz) {
+   cpu_khz = x86_platform.calibrate_cpu();
+   tsc_khz = x86_platform.calibrate_tsc();
+   }

  /*
   * Trust non-zero tsc_khz as authorative,









Re: [PATCH v13 14/18] x86/tsc: initialize cyc2ns when tsc freq. is determined

2018-07-13 Thread Dou Liyang




At 07/12/2018 08:04 AM, Pavel Tatashin wrote:

cyc2ns converts tsc to nanoseconds, and it is handled in a per-cpu data
structure.

Currently, the setup code for c2ns data for every possible CPU goes through
the same sequence of calculations as for the boot CPU, but is based on the
same tsc frequency as the boot CPU, and thus this is not necessary.

Initialize the boot cpu when tsc frequency is determined. Copy the
calculated data from the boot CPU to the other CPUs in tsc_init().

In addition do the following:

- Remove unnecessary zeroing of c2ns data by removing cyc2ns_data_init()
- Split set_cyc2ns_scale() into two functions, so set_cyc2ns_scale() can be
   called when system is up, and wraps around __set_cyc2ns_scale() that can
   be called directly when system is booting but avoids saving restoring
   IRQs and going and waking up from idle.

Suggested-by: Thomas Gleixner 
Signed-off-by: Pavel Tatashin 
---
  arch/x86/kernel/tsc.c | 94 ---
  1 file changed, 53 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index bc8eb82050a3..0b1abe7fdd8e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -103,23 +103,6 @@ void cyc2ns_read_end(void)
   *  -johns...@us.ibm.com "math is hard, lets go shopping!"
   */
  
-static void cyc2ns_data_init(struct cyc2ns_data *data)

-{
-   data->cyc2ns_mul = 0;
-   data->cyc2ns_shift = 0;
-   data->cyc2ns_offset = 0;
-}
-
-static void __init cyc2ns_init(int cpu)
-{
-   struct cyc2ns *c2n = &per_cpu(cyc2ns, cpu);
-
-   cyc2ns_data_init(&c2n->data[0]);
-   cyc2ns_data_init(&c2n->data[1]);
-
-   seqcount_init(&c2n->seq);
-}
-
  static inline unsigned long long cycles_2_ns(unsigned long long cyc)
  {
struct cyc2ns_data data;
@@ -135,18 +118,11 @@ static inline unsigned long long cycles_2_ns(unsigned 
long long cyc)
return ns;
  }
  
-static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_now)

+static void __set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long 
tsc_now)
  {
unsigned long long ns_now;
struct cyc2ns_data data;
struct cyc2ns *c2n;
-   unsigned long flags;
-
-   local_irq_save(flags);
-   sched_clock_idle_sleep_event();
-
-   if (!khz)
-   goto done;
  
  	ns_now = cycles_2_ns(tsc_now);
  
@@ -178,12 +154,55 @@ static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_

c2n->data[0] = data;
raw_write_seqcount_latch(&c2n->seq);
c2n->data[1] = data;
+}
+
+static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long 
tsc_now)
+{
+   unsigned long flags;
+
+   local_irq_save(flags);
+   sched_clock_idle_sleep_event();
+
+   if (khz)
+   __set_cyc2ns_scale(khz, cpu, tsc_now);
  
-done:

sched_clock_idle_wakeup_event();
local_irq_restore(flags);
  } >
+/*
+ * Initialize cyc2ns for boot cpu
+ */
+static void __init cyc2ns_init_boot_cpu(void)
+{
+   struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
+
+   seqcount_init(&c2n->seq);
+   __set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
+}

Hi Paval,

Here we didn't protect it with local_irq_save()/local_irq_restore()


+
+/*
+ * Secondary CPUs do not run through cyc2ns_init(), so set up
+ * all the scale factors for all CPUs, assuming the same
+ * speed as the bootup CPU. (cpufreq notifiers will fix this
+ * up if their speed diverges)
+ */
+static void __init cyc2ns_init_secondary_cpus(void)
+{
+   unsigned int cpu, this_cpu = smp_processor_id();
+   struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
+   struct cyc2ns_data *data = c2n->data;
+
+   for_each_possible_cpu(cpu) {
+   if (cpu != this_cpu) {
+   seqcount_init(&c2n->seq);
+   c2n = per_cpu_ptr(&cyc2ns, cpu);
+   c2n->data[0] = data[0];
+   c2n->data[1] = data[1];
+   }
+   }
+}
+
  /*
   * Scheduler clock - returns current time in nanosec units.
   */
@@ -1385,6 +1404,10 @@ void __init tsc_early_init(void)
if (!determine_cpu_tsc_frequncies())
return;
loops_per_jiffy = get_loops_per_jiffy();
+
+   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+   tsc_store_and_check_tsc_adjust(true);
+   cyc2ns_init_boot_cpu();
  }
  
  void __init tsc_init(void)

@@ -1400,23 +1423,12 @@ void __init tsc_init(void)
mark_tsc_unstable("could not calculate TSC khz");
return;
}
+   /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+   tsc_store_and_check_tsc_adjust(true);
+   cyc2ns_init_boot_cpu();


In tsc_init(), the timer interrupt has worked, Is that OK?

IMO, no need to change so much, just use the original code,
eg:

+static void __init cyc2ns_

Re: [PATCH v13 13/18] x86/tsc: calibrate tsc only once

2018-07-13 Thread Dou Liyang



At 07/12/2018 08:04 AM, Pavel Tatashin wrote:

During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
and the second time in tsc_init().

Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
the calibration is done only early, and make tsc_init() to use the values
already determined in tsc_early_init().

Sometimes it is not possible to determine tsc early, as the subsystem that
is required is not yet initialized, in such case try again later in
tsc_init().

Suggested-by: Thomas Gleixner 
Signed-off-by: Pavel Tatashin 


Hi Pavel,

Aha, a complex solution for a simple problem! ;-) And I did find any
benefits of doing that. did I miss something?

As the cpu_khz and tsc_khz are global variables and the tsc_khz may
be reset to cpu_khz. How about the following patch.

Thanks,
dou
8<---

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 74392d9d51e0..e54fa1037d45 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1370,8 +1370,10 @@ void __init tsc_init(void)
return;
}

-   cpu_khz = x86_platform.calibrate_cpu();
-   tsc_khz = x86_platform.calibrate_tsc();
+   if (!tsc_khz) {
+   cpu_khz = x86_platform.calibrate_cpu();
+   tsc_khz = x86_platform.calibrate_tsc();
+   }

/*
 * Trust non-zero tsc_khz as authorative,


---
  arch/x86/include/asm/tsc.h |  2 +-
  arch/x86/kernel/setup.c|  2 +-
  arch/x86/kernel/tsc.c  | 86 --
  3 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 2701d221583a..c4368ff73652 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -33,7 +33,7 @@ static inline cycles_t get_cycles(void)
  extern struct system_counterval_t convert_art_to_tsc(u64 art);
  extern struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns);
  
-extern void tsc_early_delay_calibrate(void);

+extern void tsc_early_init(void);
  extern void tsc_init(void);
  extern void mark_tsc_unstable(char *reason);
  extern int unsynchronized_tsc(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 01fcc8bf7c8f..07445482bb57 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1014,7 +1014,7 @@ void __init setup_arch(char **cmdline_p)
 */
init_hypervisor_platform();
  
-	tsc_early_delay_calibrate();

+   tsc_early_init();
  
  	x86_init.resources.probe_roms();
  
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c

index 186395041725..bc8eb82050a3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -33,6 +33,8 @@ EXPORT_SYMBOL(cpu_khz);
  unsigned int __read_mostly tsc_khz;
  EXPORT_SYMBOL(tsc_khz);
  
+#define KHZ	1000

+
  /*
   * TSC can be unstable due to cpufreq or due to unsynced TSCs
   */
@@ -1335,34 +1337,10 @@ static int __init init_tsc_clocksource(void)
   */
  device_initcall(init_tsc_clocksource);
  
-void __init tsc_early_delay_calibrate(void)

-{
-   unsigned long lpj;
-
-   if (!boot_cpu_has(X86_FEATURE_TSC))
-   return;
-
-   cpu_khz = x86_platform.calibrate_cpu();
-   tsc_khz = x86_platform.calibrate_tsc();
-
-   tsc_khz = tsc_khz ? : cpu_khz;
-   if (!tsc_khz)
-   return;
-
-   lpj = tsc_khz * 1000;
-   do_div(lpj, HZ);
-   loops_per_jiffy = lpj;
-}
-
-void __init tsc_init(void)
+static bool determine_cpu_tsc_frequncies(void)
  {
-   u64 lpj, cyc;
-   int cpu;
-
-   if (!boot_cpu_has(X86_FEATURE_TSC)) {
-   setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-   return;
-   }
+   /* Make sure that cpu and tsc are not already calibrated */
+   WARN_ON(cpu_khz || tsc_khz);
  
  	cpu_khz = x86_platform.calibrate_cpu();

tsc_khz = x86_platform.calibrate_tsc();
@@ -1377,20 +1355,51 @@ void __init tsc_init(void)
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;
  
-	if (!tsc_khz) {

-   mark_tsc_unstable("could not calculate TSC khz");
-   setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-   return;
-   }
+   if (tsc_khz == 0)
+   return false;
  
  	pr_info("Detected %lu.%03lu MHz processor\n",

-   (unsigned long)cpu_khz / 1000,
-   (unsigned long)cpu_khz % 1000);
+   (unsigned long)cpu_khz / KHZ,
+   (unsigned long)cpu_khz % KHZ);
  
  	if (cpu_khz != tsc_khz) {

pr_info("Detected %lu.%03lu MHz TSC",
-   (unsigned long)tsc_khz / 1000,
-   (unsigned long)tsc_khz % 1000);
+   (unsigned long)tsc_khz / KHZ,
+   (unsigned long)tsc_khz % KHZ);
+   }
+   return true;
+}
+
+static unsigned long get_loops_per_jiffy(void)
+{
+   

Re: Bug report about KASLR and ZONE_MOVABLE

2018-07-11 Thread Dou Liyang

Hi Baoquan,

At 07/11/2018 08:40 PM, Baoquan He wrote:

Please try this v3 patch:
>>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Wed, 11 Jul 2018 20:31:51 +0800
Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He 



You fix this in the _zone_init side_. This may make the 'kernelcore=' or
'movablecore=' failed if the KASLR puts the kernel back the tail of the
last node, or more.

Due to we have fix the mirror memory in KASLR side, and Chao is trying
to fix the 'movable_node' in KASLR side. Have you had a chance to fix
this in the KASLR side.



---
  mm/page_alloc.c | 20 +++-
  1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100..390eb35 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@ static unsigned long __init 
early_calculate_totalpages(void)
  static void __init find_zone_movable_pfns_for_nodes(void)
  {
int i, nid;
-   unsigned long usable_startpfn;
+   unsigned long usable_startpfn, real_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6681,10 +6681,20 @@ static void __init 
find_zone_movable_pfns_for_nodes(void)
if (start_pfn >= end_pfn)
continue;
  
+			/*

+* KASLR may put kernel near tail of node memory,
+* start after kernel on that node to find PFN
+* which zone begins.
+*/
+   if (pfn_to_nid(PFN_UP(_etext)) == i)


Here, did you want to check the Node id? seems may be nid.

and

for_each_node_state(nid, N_MEMORY) {

... seems check here is more suitable.

for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {

}
}

Thanks,
dou


+   real_startpfn = max(usable_startpfn,
+   PFN_UP(_etext))
+   else
+   real_startpfn = usable_startpfn;
/* Account for what is only usable for kernelcore */
-   if (start_pfn < usable_startpfn) {
+   if (start_pfn < real_startpfn) {
unsigned long kernel_pages;
-   kernel_pages = min(end_pfn, usable_startpfn)
+   kernel_pages = min(end_pfn, real_startpfn)
- start_pfn;
  
  kernelcore_remaining -= min(kernel_pages,

@@ -6693,7 +6703,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore);
  
  /* Continue if range is now fully accounted */

-   if (end_pfn <= usable_startpfn) {
+   if (end_pfn <= real_startpfn) {
  
  	/*

 * Push zone_movable_pfn to the end so
@@ -6704,7 +6714,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
zone_movable_pfn[nid] = end_pfn;
continue;
}
-   start_pfn = usable_startpfn;
+   start_pfn = real_startpfn;
}
  
  			/*







[tip:x86/urgent] x86/vector: Fix the args of vector_alloc tracepoint

2018-06-06 Thread tip-bot for Dou Liyang
Commit-ID:  838d76d63ec4eaeaa12bedfa50f261480f615200
Gitweb: https://git.kernel.org/tip/838d76d63ec4eaeaa12bedfa50f261480f615200
Author: Dou Liyang 
AuthorDate: Fri, 1 Jun 2018 14:50:31 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 6 Jun 2018 13:38:02 +0200

x86/vector: Fix the args of vector_alloc tracepoint

The vector_alloc tracepont reversed the reserved and ret aggs, that made
the trace print wrong. Exchange them.

Fixes: 8d1e3dca7de6 ("x86/vector: Add tracepoints for vector management")
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: h...@zytor.com
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180601065031.21872-1-douly.f...@cn.fujitsu.com

---
 arch/x86/include/asm/trace/irq_vectors.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/trace/irq_vectors.h 
b/arch/x86/include/asm/trace/irq_vectors.h
index 22647a642e98..0af81b590a0c 100644
--- a/arch/x86/include/asm/trace/irq_vectors.h
+++ b/arch/x86/include/asm/trace/irq_vectors.h
@@ -236,7 +236,7 @@ TRACE_EVENT(vector_alloc,
TP_PROTO(unsigned int irq, unsigned int vector, bool reserved,
 int ret),
 
-   TP_ARGS(irq, vector, ret, reserved),
+   TP_ARGS(irq, vector, reserved, ret),
 
TP_STRUCT__entry(
__field(unsigned int,   irq )


[tip:x86/urgent] x86/idt: Simplify the idt_setup_apic_and_irq_gates()

2018-06-06 Thread tip-bot for Dou Liyang
Commit-ID:  336628128826a9acb045571a960e32e4414ccb61
Gitweb: https://git.kernel.org/tip/336628128826a9acb045571a960e32e4414ccb61
Author: Dou Liyang 
AuthorDate: Wed, 23 May 2018 10:35:55 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 6 Jun 2018 13:38:01 +0200

x86/idt: Simplify the idt_setup_apic_and_irq_gates()

The idt_setup_apic_and_irq_gates() sets the gates from
FIRST_EXTERNAL_VECTOR up to FIRST_SYSTEM_VECTOR first. then secondly, from
FIRST_SYSTEM_VECTOR to NR_VECTORS, it takes both APIC=y and APIC=n into
account.

But for APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS, all
vectors has been set at the first step.

Simplify the second step, make it just work for APIC=y.

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Link: https://lkml.kernel.org/r/20180523023555.2933-1-douly.f...@cn.fujitsu.com

---
 arch/x86/kernel/idt.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2c3a1b4294eb..74383a3780dc 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -317,15 +317,12 @@ void __init idt_setup_apic_and_irq_gates(void)
set_intr_gate(i, entry);
}
 
-   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
 #ifdef CONFIG_X86_LOCAL_APIC
+   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
set_bit(i, system_vectors);
set_intr_gate(i, spurious_interrupt);
-#else
-   entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
-   set_intr_gate(i, entry);
-#endif
}
+#endif
 }
 
 /**


Re: [patch 3/8] x86/apic: Provide apic_ack_irq()

2018-06-06 Thread Dou Liyang

Hi Thomas,

At 06/06/2018 04:04 PM, Thomas Gleixner wrote:

On Wed, 6 Jun 2018, Dou Liyang wrote:


Hi Thomas,

At 06/05/2018 07:41 PM, Thomas Gleixner wrote:

On Tue, 5 Jun 2018, Dou Liyang wrote:

+{
+   if (unlikely(irqd_is_setaffinity_pending(irqd)))


Affinity pending is also judged in


+   irq_move_irq(irqd);


If we can remove the if(...) statement here


That requires to fix all call sites in ia64 and that's why I didn't.  But


I didn't express clearly, I meant remove the if(...) statement from
apic_ack_irq(), it doesn't require to fix the call sites in ia64.


I put the check there on purpose as I explained in the changelog:

Making the invocation of irq_move_irq() conditional avoids the out of
line call if the pending bit is not set.



I completely understand now, thank you so much. :-)

Thanks,
dou


Thanks,

tglx








Re: [patch 3/8] x86/apic: Provide apic_ack_irq()

2018-06-05 Thread Dou Liyang

Hi Thomas,

At 06/05/2018 07:41 PM, Thomas Gleixner wrote:

On Tue, 5 Jun 2018, Dou Liyang wrote:

+{
+   if (unlikely(irqd_is_setaffinity_pending(irqd)))


Affinity pending is also judged in


+   irq_move_irq(irqd);


If we can remove the if(...) statement here


That requires to fix all call sites in ia64 and that's why I didn't.  But


I didn't express clearly, I meant remove the if(...) statement from
apic_ack_irq(), it doesn't require to fix the call sites in ia64.

+void apic_ack_irq(struct irq_data *irqd)
+{
+   irq_move_irq(irqd);
+   ack_APIC_irq();
+}

BTW, If apic_ack_irq() can accept _any_ irq_data when hierarchical
irqdomains are enabled[1]? If it is true, If there is a situation in
the original code that we should avoid:

  If the top-level irq_data has the IRQD_SETAFFINITY_PENDING state, but
  non-top-level irq_data state not, when using non-top-level irq_data in
  apic_ack_irq(), we may skip the irq_move_irq() which we should call.

[1] commit 77ed42f18edd("genirq: Prevent crash in irq_move_irq()")


we can make irq_move_irq() an inline function and have the check in the
inline.



I don't know why do we need to make irq_move_irq() an inline function.

And, yes, irq_move_irq() has already had the check

...
if (likely(!irqd_is_setaffinity_pending(idata)))
return;
...

Thanks,
dou




Re: [patch 3/8] x86/apic: Provide apic_ack_irq()

2018-06-05 Thread Dou Liyang

Hi Thomas,

At 06/04/2018 11:33 PM, Thomas Gleixner wrote:

apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
interrupt remapping is not available or disable.

Remapped interrupts and also some of the platform specific special
interrupts, e.g. UV, invoke ack_APIC_irq() directly.

To address the issue of failing an affinity update with -EBUSY the delayed
affinity mechanism can be reused, but ack_APIC_irq() does not handle
that. Adding this to ack_APIC_irq() is not possible, because that function
is also used for exceptions and directly handled interrupts like IPIs.

Create a new function, which just contains the conditional invocation of
irq_move_irq() and the final ack_APIC_irq(). Making the invocation of
irq_move_irq() conditional avoids the out of line call if the pending bit
is not set.

Reuse the new function in apic_ack_edge().

Preparatory change for the real fix

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner
Cc:sta...@vger.kernel.org
---
  arch/x86/include/asm/apic.h   |2 ++
  arch/x86/kernel/apic/vector.c |   10 --
  2 files changed, 10 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -436,6 +436,8 @@ static inline void apic_set_eoi_write(vo
  
  #endif /* CONFIG_X86_LOCAL_APIC */
  
+extern void apic_ack_irq(struct irq_data *data);

+
  static inline void ack_APIC_irq(void)
  {
/*
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -809,11 +809,17 @@ static int apic_retrigger_irq(struct irq
return 1;
  }
  
+void apic_ack_irq(struct irq_data *irqd)

+{
+   if (unlikely(irqd_is_setaffinity_pending(irqd)))


Affinity pending is also judged in


+   irq_move_irq(irqd);


If we can remove the if(...) statement here

Thanks,
dou


+   ack_APIC_irq();
+}
+
  void apic_ack_edge(struct irq_data *irqd)
  {
irq_complete_move(irqd_cfg(irqd));
-   irq_move_irq(irqd);
-   ack_APIC_irq();
+   apic_ack_irq(irqd);
  }
  
  static struct irq_chip lapic_controller = {





Re: [patch 2/8] genirq/generic_pending: Do not lose pending affinity update

2018-06-05 Thread Dou Liyang

Hi Thomas,

At 06/04/2018 11:33 PM, Thomas Gleixner wrote:

The generic pending interrupt mechanism moves interrupts from the interrupt
handler on the original target CPU to the new destination CPU. This is
required for x86 and ia64 due to the way the interrupt delivery and
acknowledge works if the interrupts are not remapped.

However that update can fail for various reasons. Some of them are valid
reasons to discard the pending update, but the case, when the previous move
has not been fully cleaned up is not a legit reason to fail.

Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
a pending cleanup, and rearm the pending move in the irq dexcriptor so it's


s/dexcriptor/descriptor


tried again when the next interrupt arrives.

Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
Signed-off-by: Thomas Gleixner
Cc:sta...@vger.kernel.org
---
  kernel/irq/migration.c |   24 ++--
  1 file changed, 18 insertions(+), 6 deletions(-)

--- a/kernel/irq/migration.c
+++ b/kernel/irq/migration.c
@@ -38,17 +38,18 @@ bool irq_fixup_move_pending(struct irq_d
  void irq_move_masked_irq(struct irq_data *idata)
  {
struct irq_desc *desc = irq_data_to_desc(idata);
-   struct irq_chip *chip = desc->irq_data.chip;
+   struct irq_data *data = &desc->irq_data;
+   struct irq_chip *chip = data->chip;
  
-	if (likely(!irqd_is_setaffinity_pending(&desc->irq_data)))

+   if (likely(!irqd_is_setaffinity_pending(data)))
return;
  
-	irqd_clr_move_pending(&desc->irq_data);

+   irqd_clr_move_pending(data);
  
  	/*

 * Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
 */
-   if (irqd_is_per_cpu(&desc->irq_data)) {
+   if (irqd_is_per_cpu(data)) {
WARN_ON(1);
return;
}
@@ -73,9 +74,20 @@ void irq_move_masked_irq(struct irq_data
 * For correct operation this depends on the caller
 * masking the irqs.
 */
-   if (cpumask_any_and(desc->pending_mask, cpu_online_mask) < nr_cpu_ids)
-   irq_do_set_affinity(&desc->irq_data, desc->pending_mask, false);
+   if (cpumask_any_and(desc->pending_mask, cpu_online_mask) < nr_cpu_ids) {
+   int ret;
  
+		ret = irq_do_set_affinity(data, desc->pending_mask, false);

+   /*
+* If the there is a cleanup pending in the underlying


s/If the there is/If there is/

Thanks,
dou




Re: WARNING and PANIC in irq_matrix_free

2018-06-04 Thread Dou Liyang

Hi Thomas,

At 06/04/2018 07:17 PM, Thomas Gleixner wrote:

On Mon, 4 Jun 2018, Dou Liyang wrote:

Here, why didn't we avoid this cleanup by

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a75de0792942..0cc59646755f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -821,6 +821,9 @@ static void free_moved_vector(struct apic_chip_data
*apicd)
  */
 WARN_ON_ONCE(managed);

+   if (!vector)
+   return;
+
 trace_vector_free_moved(apicd->irq, cpu, vector, managed);
 irq_matrix_free(vector_matrix, cpu, vector, managed);
 per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;

Is there something I didn't consider with? ;-)


Well, that just prevents the warning, but the hlist is already
corrupted. So you'd just cure the symptom ...



I see.


I'm about to send a patch series which addresses that. Just need to finish
writing changelogs.



Thank you for telling me that.

Thanks,
dou




Re: WARNING and PANIC in irq_matrix_free

2018-06-04 Thread Dou Liyang

Hi Thomas,

Sorry to ask the questions at this series, my mailbox was kicked out of
the mailing list a few days ago, and didn't receive the e-mail.

please see below

At 05/29/2018 04:09 AM, Thomas Gleixner wrote:

On Mon, 28 May 2018, Song Liu wrote:

This doesn't fix the issue with bnxt. Here is a trace with this patch:


[...]


Thanks for providing the data!

 >  19d... 1610359248us : vector_deactivate: irq=31 is_managed=0 
can_reserve=1 reserve=0
 >  19d... 1610359248us : vector_clear: irq=31 vector=33 cpu=20 
prev_vector=0 prev_cpu=2
 >  19d... 1610359249us : irq_matrix_free: bit=33 cpu=20 online=1 
avl=201 alloc=0 managed=0 online_maps=56 global_avl=11241, 
global_rsvd=25, total_alloc=15


Here IRQ 31 is shutdown and the vector freed.

 >  19d... 1610359249us : irq_matrix_reserve: online_maps=56 
global_avl=11241, global_rsvd=26, total_alloc=15

 >  19d... 1610359249us : vector_reserve: irq=31 ret=0
 >  19d... 1610359249us : vector_config: irq=31 vector=239 cpu=0 
apicdest=0x


And set to the magic reservation vector 239 to catch spurious interrupts.

 >  20dN.. 1610366654us : vector_activate: irq=31 is_managed=0 
can_reserve=1 reserve=0
 >  20dN.. 1610366654us : vector_alloc: irq=31 vector=4294967268 
reserved=1 ret=0
 >  20dN.. 1610366655us : irq_matrix_alloc: bit=33 cpu=9 online=1 
avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, 
global_rsvd=28, total_alloc=16
 >  20dN.. 1610366655us : vector_update: irq=31 vector=33 cpu=9 
prev_vector=0 prev_cpu=20

  
   this means there is no associated previous vector.


 >  20dN.. 1610366656us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
 >  20dN.. 1610366656us : vector_config: irq=31 vector=33 cpu=9 
apicdest=0x0014


So here it gets initialized again and targets CPU9 now.

 >  20dN.. 161032us : irq_matrix_alloc: bit=33 cpu=20 online=1 
avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, 
global_rsvd=28, total_alloc=16
 >  20dN.. 161032us : vector_update: irq=31 vector=33 cpu=20 
prev_vector=33 prev_cpu=9

 >  20dN.. 161032us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
 >  20dN.. 161032us : vector_config: irq=31 vector=33 cpu=20 
apicdest=0x002c


Here it is reconfigured to CPU 20. Now that update schedules vector 33 on
CPU9 for cleanup.

 >  20dN.. 161036us : irq_matrix_alloc: bit=34 cpu=2 online=1 
avl=199 alloc=2 managed=0 online_maps=56 global_avl=11239, 
global_rsvd=28, total_alloc=17
 >  20dN.. 161036us : vector_update: irq=31 vector=34 cpu=2 
prev_vector=33 prev_cpu=20

 >  20dN.. 161036us : vector_alloc: irq=31 vector=34 reserved=1 ret=0
 >  20dN.. 161036us : vector_config: irq=31 vector=34 cpu=2 
apicdest=0x0004


So here the shit hits the fan because that update schedules vector 33 on
CPU20 for cleanup while the previous cleanup for CPU9 has not been done
yet. Cute. or not so cute

 >  20dNh. 161039us : vector_free_moved: irq=31 cpu=20 vector=33 
is_managed=0
 >  20dNh. 1610366670us : irq_matrix_free: bit=33 cpu=20 online=1 
avl=201 alloc=0 managed=0 online_maps=56 global_avl=11240, 
global_rsvd=28, total_alloc=16


And frees the CPU 20 vector

 >   9d.h. 1610366696us : vector_free_moved: irq=31 cpu=20 vector=0 
is_managed=0




Here, why didn't we avoid this cleanup by

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a75de0792942..0cc59646755f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -821,6 +821,9 @@ static void free_moved_vector(struct apic_chip_data 
*apicd)

 */
WARN_ON_ONCE(managed);

+   if (!vector)
+   return;
+
trace_vector_free_moved(apicd->irq, cpu, vector, managed);
irq_matrix_free(vector_matrix, cpu, vector, managed);
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;

Is there something I didn't consider with? ;-)

Thanks,
dou.


And then CPU9 claims that it's queued for cleanup. Bah.

I'm still working on a fix as the elegant solution refused to work.

Thanks,

 tglx





[PATCH] x86/vector: Fix the args of vector_alloc tracepoint

2018-05-31 Thread Dou Liyang
The vector_alloc tracepont reversed the reserved and ret aggs, that made
the trace print wrong. Exchange them.

Signed-off-by: Dou Liyang 
---
 arch/x86/include/asm/trace/irq_vectors.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/trace/irq_vectors.h 
b/arch/x86/include/asm/trace/irq_vectors.h
index 22647a642e98..0af81b590a0c 100644
--- a/arch/x86/include/asm/trace/irq_vectors.h
+++ b/arch/x86/include/asm/trace/irq_vectors.h
@@ -236,7 +236,7 @@ TRACE_EVENT(vector_alloc,
TP_PROTO(unsigned int irq, unsigned int vector, bool reserved,
 int ret),
 
-   TP_ARGS(irq, vector, ret, reserved),
+   TP_ARGS(irq, vector, reserved, ret),
 
TP_STRUCT__entry(
__field(unsigned int,   irq )
-- 
2.14.3





[RESEND RFC PATCH] acpi/processor: Remove the check of duplicate processors

2018-05-28 Thread Dou Liyang
We found there are some processors which have the same processor ID
but in different PXM in the ACPI namespace. such as this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
..
89  <-> 0
89  <-> 1
89  <-> 2
89  <-> 3
..

So we create a mechanism to validate them. make the processor(ID=89)
as invalid. And once a processor be hotplugged physically, we check its
processor id.

Commit 8e089eaa1999 ("acpi: Provide mechanism to validate processors in the 
ACPI tables")
Commit a77d6cd96849 ("acpi/processor: Check for duplicate processor ids at 
hotplug time")

Recently, I found the check mechanism has a bug, it didn't use the
acpi_processor_ids_walk() right and always gave us a wrong result.
First, I fixed it by modifying the value with AE_OK which is the
standard acpi_status value.

https://lkml.org/lkml/2018/3/20/273

But, now, I even think this check is useless. my reasons are following:

1). Based on the practical tests, It works well, and no bug be reported
2). Based on the code, the duplicate cases can be dealed with by

   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))

That seems more reasonable, let's see the following case:

  Before the patch, After the patch

the first processor(ID=89)
hot-addfailedsuccess

the others processor(ID=89)
hot-add    failed        failed

So, Remove the check code.

Signed-off-by: Dou Liyang 

Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c | 126 --
 include/linux/acpi.h  |   3 -
 2 files changed, 129 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..8358708e0bbb 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -281,13 +281,6 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
pr->acpi_id = value;
}
 
-   if (acpi_duplicate_processor_id(pr->acpi_id)) {
-   dev_err(&device->dev,
-   "Failed to get unique processor _UID (0x%x)\n",
-   pr->acpi_id);
-   return -ENODEV;
-   }
-
pr->phys_id = acpi_get_phys_id(pr->handle, device_declaration,
pr->acpi_id);
if (invalid_phys_cpuid(pr->phys_id))
@@ -579,127 +572,8 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
-/* The number of the unique processor IDs */
-static int nr_unique_ids __initdata;
-
-/* The number of the duplicate processor IDs */
-static int nr_duplicate_ids;
-
-/* Used to store the unique processor IDs */
-static int unique_processor_ids[] __initdata = {
-   [0 ... NR_CPUS - 1] = -1,
-};
-
-/* Used to store the duplicate processor IDs */
-static int duplicate_processor_ids[] = {
-   [0 ... NR_CPUS - 1] = -1,
-};
-
-static void __init processor_validated_ids_update(int proc_id)
-{
-   int i;
-
-   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
-   return;
-
-   /*
-* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
-* already in the IDs, do nothing.
-*/
-   for (i = 0; i < nr_duplicate_ids; i++) {
-   if (duplicate_processor_ids[i] == proc_id)
-   return;
-   }
-
-   /*
-* Secondly, compare the proc_id with unique IDs, if the proc_id is in
-* the IDs, put it in the duplicate IDs.
-*/
-   for (i = 0; i < nr_unique_ids; i++) {
-   if (unique_processor_ids[i] == proc_id) {
-   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
-   nr_duplicate_ids++;
-   return;
-   }
-   }
-
-   /*
-* Lastly, the proc_id is a unique ID, put it in the unique IDs.
-*/
-   unique_processor_ids[nr_unique_ids] = proc_id;
-   nr_unique_ids++;
-}
-
-static acpi_status __init acpi_processor_ids_walk(acpi_handle handle,
- u32 lvl,
- void *context,
- void **rv)
-{
-   acpi_status status;
-   acpi_object_type acpi_type;
-   unsigned long long uid;
-   union acpi_object object = { 0 };
-   struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
-
-   status = acpi_get_type(handle, &acpi_type);
-   if (ACPI_FAILURE(status))
-   return false;
-
-   switch (acpi_type) {
-   case 

Re: [PATCH 0/1] Remove the check of duplicate processors

2018-05-28 Thread Dou Liyang

Hi Rafael,

At 05/28/2018 04:40 PM, Rafael J. Wysocki wrote:



Can you please resend the patch with the above information in the
changelog of it?


Yes, I resend the patch right now.



That would make it easier to review and discuss it.

Also I think that we need some sort of a check against duplicate IDs.



Thanks,
dou








[PATCH v2] x86/idt: Simplify the idt_setup_apic_and_irq_gates()

2018-05-22 Thread Dou Liyang
The idt_setup_apic_and_irq_gates() sets the gates from
FIRST_EXTERNAL_VECTOR up to FIRST_SYSTEM_VECTOR first. then secondly,
from FIRST_SYSTEM_VECTOR to NR_VECTORS, it takes both APIC=y and APIC=n
into account.

But for APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS, all
vectors has been set at the first step.

Simplify the second step, make it just work for APIC=y.

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/idt.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2c3a1b4294eb..74383a3780dc 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -317,15 +317,12 @@ void __init idt_setup_apic_and_irq_gates(void)
set_intr_gate(i, entry);
}
 
-   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
 #ifdef CONFIG_X86_LOCAL_APIC
+   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
set_bit(i, system_vectors);
set_intr_gate(i, spurious_interrupt);
-#else
-   entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
-   set_intr_gate(i, entry);
-#endif
}
+#endif
 }
 
 /**
-- 
2.14.3





Re: [PATCH 4/5] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-05-22 Thread Dou Liyang

At 05/22/2018 09:47 AM, Dou Liyang wrote:



At 05/19/2018 11:06 PM, Thomas Gleixner wrote:

On Tue, 20 Mar 2018, Dou Liyang wrote:

ACPI driver should make sure all the processor IDs in their ACPI 
Namespace
are unique for CPU hotplug. the driver performs a depth-first walk of 
the

namespace tree and calls the acpi_processor_ids_walk().

But, the acpi_processor_ids_walk() will return true if one processor is
checked, that cause the walk break after walking pass the first 
processor.


Repace the value with AE_OK which is the standard acpi_status value.

Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for 
processor enumeration")


Signed-off-by: Dou Liyang 
---
  drivers/acpi/acpi_processor.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c 
b/drivers/acpi/acpi_processor.c

index 449d86d39965..db5bdb59639c 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -663,11 +663,11 @@ static acpi_status __init (acpi_handle handle,
  }
  processor_validated_ids_update(uid);
-    return true;
+    return AE_OK;
  err:
  acpi_handle_info(handle, "Invalid processor object\n");
-    return false;
+    return AE_OK;


I'm not sure whether this is the right return value here. Rafael?



+Cc Rafael's common used email address.

I am sorry, I created the cc list using ./script/get_maintainers.pl ...
and didn't check it.

Thanks,
dou


Hi, Thomas, Rafael,

Yes, I used AE_OK to make sure it can skip the invalid objects and
continue to do the following other objects, I'm also not sure.

For this bug, recently, I sent another patch to remove this check code
away.

    https://lkml.org/lkml/2018/5/17/320

IMO, the duplicate IDs can be avoid by the other code

    if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))  1)

As the mapping of cpu_id(pr->id) and processor_id is fixed, when
hot-plugging a physical CPU, if its processor_id is duplicated with the
present, the above condition 1) will be 0, and Linux will do not add
this CPU.

And, when every time the system starts, this code will be executed, it
will waste more time with the increase in the number of CPU.

So I prefer to remove this code.

Thanks,
 dou








Re: [PATCH] x86/idt: Simplify the idt_setup_apic_and_irq_gates()

2018-05-21 Thread Dou Liyang

Hi Thomas,

At 05/19/2018 08:32 PM, Thomas Gleixner wrote:

On Thu, 26 Apr 2018, Dou Liyang wrote:


The vectors between FIRST_SYSTEM_VECTOR and NR_VECTORS are special IRQ
vectors used by the SMP architecture. But, if X86_LOCAL_APIC=n, it will
not be used, and the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS.


Correct, but that function has nothing to do with FIRST_SYSTEM_VECTOR.



Oops, sorry, when I reread, my changelog even made me misunderstand.
the patch hided it.

void __init idt_setup_apic_and_irq_gates(void)
...
for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR)
   ^^^
...
for_each_clear_bit_from(i, system_vectors, NR_VECTORS)

What I want to say is:

  In the APIC=n system, the FIRST_SYSTEM_VECTOR is equal to
  NR_VECTORS, So all entries has been set by

for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR)
       ...

  the following setup code for APIC=n is redundant. Just setup gates for
  APIC=y.

If it is OK, I will send v2.

Thanks,

dou


diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2c3a1b4294eb..8b4174890706 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -317,15 +317,16 @@ void __init idt_setup_apic_and_irq_gates(void)
set_intr_gate(i, entry);
}
  
-	for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {

+   /*
+* If X86_LOCAL_APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS
+* Just consider the X86_LOCAL_APIC=y case
+*/
  #ifdef CONFIG_X86_LOCAL_APIC
+   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
set_bit(i, system_vectors);
set_intr_gate(i, spurious_interrupt);
-#else
-   entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
-   set_intr_gate(i, entry);
-#endif
}
+#endif


That change breaks the LOCAL_APIC=n case in a subtle way. What the function
does is to set _ALL_ entries starting from FIRST_EXTERNAL_VECTOR up to
NR_VECTORS in the IDT to a known target, except those which are already
occupied by a system vector.

So for APIC=y this sets them to: spurious vector and for APIC=n it sets it
to the corresppnding vector entry. You remove the latter...
 > Thanks,

tglx









Re: [PATCH 4/5] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-05-21 Thread Dou Liyang



At 05/19/2018 11:06 PM, Thomas Gleixner wrote:

On Tue, 20 Mar 2018, Dou Liyang wrote:


ACPI driver should make sure all the processor IDs in their ACPI Namespace
are unique for CPU hotplug. the driver performs a depth-first walk of the
namespace tree and calls the acpi_processor_ids_walk().

But, the acpi_processor_ids_walk() will return true if one processor is
checked, that cause the walk break after walking pass the first processor.

Repace the value with AE_OK which is the standard acpi_status value.

Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for processor 
enumeration")

Signed-off-by: Dou Liyang 
---
  drivers/acpi/acpi_processor.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..db5bdb59639c 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -663,11 +663,11 @@ static acpi_status __init (acpi_handle handle,
}
  
  	processor_validated_ids_update(uid);

-   return true;
+   return AE_OK;
  
  err:

acpi_handle_info(handle, "Invalid processor object\n");
-   return false;
+   return AE_OK;


I'm not sure whether this is the right return value here. Rafael?


Hi, Thomas, Rafael,

Yes, I used AE_OK to make sure it can skip the invalid objects and
continue to do the following other objects, I'm also not sure.

For this bug, recently, I sent another patch to remove this check code
away.

   https://lkml.org/lkml/2018/5/17/320

IMO, the duplicate IDs can be avoid by the other code

   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))  1)

As the mapping of cpu_id(pr->id) and processor_id is fixed, when
hot-plugging a physical CPU, if its processor_id is duplicated with the
present, the above condition 1) will be 0, and Linux will do not add
this CPU.

And, when every time the system starts, this code will be executed, it
will waste more time with the increase in the number of CPU.

So I prefer to remove this code.

Thanks,
dou




[tip:x86/apic] x86/vector: Merge allocate_vector() into assign_vector_locked()

2018-05-19 Thread tip-bot for Dou Liyang
Commit-ID:  2773397171ac4b6e794ba0b3e34c06cbaf29897a
Gitweb: https://git.kernel.org/tip/2773397171ac4b6e794ba0b3e34c06cbaf29897a
Author: Dou Liyang 
AuthorDate: Fri, 11 May 2018 16:09:56 +0800
Committer:  Thomas Gleixner 
CommitDate: Sat, 19 May 2018 15:09:11 +0200

x86/vector: Merge allocate_vector() into assign_vector_locked()

assign_vector_locked() calls allocate_vector() to get a real vector for an
IRQ. If the current target CPU is online and in the new requested affinity
mask, allocate_vector() will return 0 and nothing should be done. But,
assign_vector_locked() calls apic_update_irq_cfg() even in that case which
is pointless.

allocate_vector() is not called from anything else, so the functions can be
merged and in case of no change the apic_update_irq_cfg() can be avoided.

[ tglx: Massaged changelog ]

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180511080956.6316-1-douly.f...@cn.fujitsu.com

---
 arch/x86/kernel/apic/vector.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2148d7..a75de0792942 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -218,7 +218,8 @@ static int reserve_irq_vector(struct irq_data *irqd)
return 0;
 }
 
-static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest)
+static int
+assign_vector_locked(struct irq_data *irqd, const struct cpumask *dest)
 {
struct apic_chip_data *apicd = apic_chip_data(irqd);
bool resvd = apicd->has_reserved;
@@ -236,22 +237,12 @@ static int allocate_vector(struct irq_data *irqd, const 
struct cpumask *dest)
return 0;
 
vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
-   if (vector > 0)
-   apic_update_vector(irqd, vector, cpu);
trace_vector_alloc(irqd->irq, vector, resvd, vector);
-   return vector;
-}
-
-static int assign_vector_locked(struct irq_data *irqd,
-   const struct cpumask *dest)
-{
-   struct apic_chip_data *apicd = apic_chip_data(irqd);
-   int vector = allocate_vector(irqd, dest);
-
if (vector < 0)
return vector;
+   apic_update_vector(irqd, vector, cpu);
+   apic_update_irq_cfg(irqd, vector, cpu);
 
-   apic_update_irq_cfg(irqd, apicd->vector, apicd->cpu);
return 0;
 }
 


[RFC PATCH 1/1] acpi/processor: Remove the check of duplicate processors

2018-05-17 Thread Dou Liyang
Now, Linux checks the duplicate processor ids at boot time based on the ACPI
namespace, and using it at hotplug time.

But, In fact, it doesn't work, becasue of the wrong walk in all APIC namespace
entries, and even if it would be right, it may miss an hotpluggable CPU(we can
hotplug one of the duplicate processor, the others can be ignored).

So, Remove the check code.

Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c | 126 --
 include/linux/acpi.h  |   3 -
 2 files changed, 129 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..8358708e0bbb 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -281,13 +281,6 @@ static int acpi_processor_get_info(struct acpi_device 
*device)
pr->acpi_id = value;
}
 
-   if (acpi_duplicate_processor_id(pr->acpi_id)) {
-   dev_err(&device->dev,
-   "Failed to get unique processor _UID (0x%x)\n",
-   pr->acpi_id);
-   return -ENODEV;
-   }
-
pr->phys_id = acpi_get_phys_id(pr->handle, device_declaration,
pr->acpi_id);
if (invalid_phys_cpuid(pr->phys_id))
@@ -579,127 +572,8 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
-/* The number of the unique processor IDs */
-static int nr_unique_ids __initdata;
-
-/* The number of the duplicate processor IDs */
-static int nr_duplicate_ids;
-
-/* Used to store the unique processor IDs */
-static int unique_processor_ids[] __initdata = {
-   [0 ... NR_CPUS - 1] = -1,
-};
-
-/* Used to store the duplicate processor IDs */
-static int duplicate_processor_ids[] = {
-   [0 ... NR_CPUS - 1] = -1,
-};
-
-static void __init processor_validated_ids_update(int proc_id)
-{
-   int i;
-
-   if (nr_unique_ids == NR_CPUS||nr_duplicate_ids == NR_CPUS)
-   return;
-
-   /*
-* Firstly, compare the proc_id with duplicate IDs, if the proc_id is
-* already in the IDs, do nothing.
-*/
-   for (i = 0; i < nr_duplicate_ids; i++) {
-   if (duplicate_processor_ids[i] == proc_id)
-   return;
-   }
-
-   /*
-* Secondly, compare the proc_id with unique IDs, if the proc_id is in
-* the IDs, put it in the duplicate IDs.
-*/
-   for (i = 0; i < nr_unique_ids; i++) {
-   if (unique_processor_ids[i] == proc_id) {
-   duplicate_processor_ids[nr_duplicate_ids] = proc_id;
-   nr_duplicate_ids++;
-   return;
-   }
-   }
-
-   /*
-* Lastly, the proc_id is a unique ID, put it in the unique IDs.
-*/
-   unique_processor_ids[nr_unique_ids] = proc_id;
-   nr_unique_ids++;
-}
-
-static acpi_status __init acpi_processor_ids_walk(acpi_handle handle,
- u32 lvl,
- void *context,
- void **rv)
-{
-   acpi_status status;
-   acpi_object_type acpi_type;
-   unsigned long long uid;
-   union acpi_object object = { 0 };
-   struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
-
-   status = acpi_get_type(handle, &acpi_type);
-   if (ACPI_FAILURE(status))
-   return false;
-
-   switch (acpi_type) {
-   case ACPI_TYPE_PROCESSOR:
-   status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
-   if (ACPI_FAILURE(status))
-   goto err;
-   uid = object.processor.proc_id;
-   break;
-
-   case ACPI_TYPE_DEVICE:
-   status = acpi_evaluate_integer(handle, "_UID", NULL, &uid);
-   if (ACPI_FAILURE(status))
-   goto err;
-   break;
-   default:
-   goto err;
-   }
-
-   processor_validated_ids_update(uid);
-   return true;
-
-err:
-   acpi_handle_info(handle, "Invalid processor object\n");
-   return false;
-
-}
-
-static void __init acpi_processor_check_duplicates(void)
-{
-   /* check the correctness for all processors in ACPI namespace */
-   acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
-   ACPI_UINT32_MAX,
-   acpi_processor_ids_walk,
-   NULL, NULL, NULL);
-   acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, acpi_processor_ids_walk,
-   NULL, NULL);
-}
-
-bool acpi_duplicate_processor_id(int proc_id)
-{
-   int i;
-
-   /*
-

[PATCH 0/1] Remove the check of duplicate processors

2018-05-17 Thread Dou Liyang
There is a bug in CPU hotplug code, I have two simple fix method, but I can't
ensure which one is better. So sent it out,  seek help.

We found some processors which have the same processor ID but in different
PXM in ACPI namespace. such as this:

proc_id   |pxm

0   <-> 0
1   <-> 0
2   <-> 1
3   <-> 1
..
89  <-> 0
89  <-> 1
89  <-> 2
89  <-> 3
..

So we create a mechanism to validate them. make the processor(ID=89)
as invalid. And once a processor be hotplugged physically, we check its
processor id.

Commit 8e089eaa1999 ("acpi: Provide mechanism to validate processors in the 
ACPI tables")
Commit a77d6cd96849 ("acpi/processor: Check for duplicate processor ids at 
hotplug time")

Recently, I found the check mechanism has a bug, it didn't use the
acpi_processor_ids_walk() right and always gave us a wrong result.
First, I fixed it by modifying the value with AE_OK which is the
standard acpi_status value.
 
https://lkml.org/lkml/2018/3/20/273

But, now, I even think this check is useless. my reasons are following:

1). Based on the practical effect, It works well, and no bug be reported
2). Based on the code, the duplicate cases can be dealed with by

   if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))

That seems more reasonable, let's see the following case:

  Before the patch, After the patch

the first processor(ID=89)
hot-addfailedsuccess

the others processor(ID=89)
hot-add        failedfailed


So, What's your idea about it.

Dou Liyang (1):
  acpi/processor: Remove the check of duplicates processor ids

 drivers/acpi/acpi_processor.c | 126 --
 include/linux/acpi.h  |   3 -
 2 files changed, 129 deletions(-)

-- 
2.14.3





[PATCH] x86/vector: Merge the allocate_vector() into assign_vector_locked()

2018-05-11 Thread Dou Liyang
The assign_vector_locked() calls allocate_vector to get a real vector for
an IRQ. If the current target CPU is online and in the new requested
affinity mask, allocate_vector() will return 0 and nothing should be done.

But, the assign_vector_locked() always call the apic_update_irq_cfg(). it's
pointless.

Merge the two function to avoid this situation and make the code simplify.

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/vector.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2148d7..a75de0792942 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -218,7 +218,8 @@ static int reserve_irq_vector(struct irq_data *irqd)
return 0;
 }
 
-static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest)
+static int
+assign_vector_locked(struct irq_data *irqd, const struct cpumask *dest)
 {
struct apic_chip_data *apicd = apic_chip_data(irqd);
bool resvd = apicd->has_reserved;
@@ -236,22 +237,12 @@ static int allocate_vector(struct irq_data *irqd, const 
struct cpumask *dest)
return 0;
 
vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
-   if (vector > 0)
-   apic_update_vector(irqd, vector, cpu);
trace_vector_alloc(irqd->irq, vector, resvd, vector);
-   return vector;
-}
-
-static int assign_vector_locked(struct irq_data *irqd,
-   const struct cpumask *dest)
-{
-   struct apic_chip_data *apicd = apic_chip_data(irqd);
-   int vector = allocate_vector(irqd, dest);
-
if (vector < 0)
return vector;
+   apic_update_vector(irqd, vector, cpu);
+   apic_update_irq_cfg(irqd, vector, cpu);
 
-   apic_update_irq_cfg(irqd, apicd->vector, apicd->cpu);
return 0;
 }
 
-- 
2.14.3





Re: recent patch "x86/acpi: Prevent X2APIC id 0xffffffff from being accounted"

2018-05-02 Thread Dou Liyang

Hi Jan,

At 05/02/2018 02:39 PM, Jan Beulich wrote:

On 02.05.18 at 03:56,  wrote:

At 04/27/2018 08:09 PM, Jan Beulich wrote:

I'm afraid I don't understand: Limiting the number of disabled CPUs is
certainly desirable when those can never be used, but why would you
want to do this when they might later get hotplugged? I'm not aware


Let's see the workflow of CPU hotplug:

1) get the CPU device info from ACPI namespace
   - it contains logical processor id

2) through the logical processor id, get the LACPI entry in MADT.

3) generate the CPU for kernel(will create a CPU id, can see by lscpu)

Normally, there are no valid CPU devices in 1) which are mapped to
the LACPI entries(0xff or 0x).

The actually number of hotplugged CPUs depends on CPU devices/processors
in ACPI namespace. The number calculated from MADT is the maximum
situation which can be cut and doesn't affect CPU hotplug. Don't worry
about it.

Now, in ACPI-based system, Linux gets the number of possible CPUs by
MADT, We are going to use the ACPI namespace to make the number
accurate. But it is so hard, because it's so late to initialize the ACPI
namespace.


So are you envisioning a model when the number of disabled CPUs can be
increased once the ACPI interpreter has been enabled? Otherwise the


Yes, good idea, but, As Thomas said:

  It needs to run _before_ setup_percpu() is invoked to
  scale everything correctly.


maximum recorded during early boot may be too low with the changes in
question. (And FTR, I agree this number may also be way too large without
them, but it being too large is a resource efficiency problem, while it being
too low is a functionality one.)


Too large number will waste vectors, and even cause bugs.

IMO, the number will just be more accurate by excluding 0x, it
only equal to, even be larger than the real number of possible CPUs.



Also, for background, besides wanting to clarify the correctness of these
two changes, I'm also trying to understand whether we want to mirror
them into the Xen hypervisor, which relies on the Dom0 kernel's ACPI
interpreter (i.e. it can and does parse MADT, but can't and doesn't parse
the ACPI name space). Hence late adjustment of the number of
hotpluggable CPUs would be even more problematic in that environment.



I am not familiar with Xen, I will consider that.

Thanks,

dou


Jan










Re: recent patch "x86/acpi: Prevent X2APIC id 0xffffffff from being accounted"

2018-05-01 Thread Dou Liyang



At 04/27/2018 08:09 PM, Jan Beulich wrote:

On 27.04.18 at 10:32,  wrote:

At 04/27/2018 03:21 PM, Jan Beulich wrote:

I've just stumbled across this commit, and I'm wondering if that's actually
correct (I too have at least one system where such IDs are reported in
MADT): For offline/absent CPUs, the firmware may not know the APIC IDs


The MADT table is not reasonable, the Local APIC entries(xAPIC/x2APIC)
in MADT is not always correct as OS want. So, we should do this sanity
check.


Of course, sanity checking is necessary.


at the point MADT is built, so I think it is quite reasonable to put ~0 in
there. The ACPID spec specifically calls out that the IDs must not change
across sleep states, which implies to me that they may change across an
offline period of a CPU. IOW I think such entries still need to contribute to
the count of disabled CPUs.


Aha, there are no CPU devices which will map it's physical ID to the
illegal number 0x, So it will never be used.


The ID will never be used, yes, but the CPU may be (after firmware has
assigned it a valid ID).


BTW, Limiting the number of the disabled CPUs is also a goal.


I'm afraid I don't understand: Limiting the number of disabled CPUs is
certainly desirable when those can never be used, but why would you
want to do this when they might later get hotplugged? I'm not aware


Let's see the workflow of CPU hotplug:

  1) get the CPU device info from ACPI namespace
     - it contains logical processor id

  2) through the logical processor id, get the LACPI entry in MADT.

  3) generate the CPU for kernel(will create a CPU id, can see by lscpu)

Normally, there are no valid CPU devices in 1) which are mapped to
the LACPI entries(0xff or 0x).

The actually number of hotplugged CPUs depends on CPU devices/processors
in ACPI namespace. The number calculated from MADT is the maximum
situation which can be cut and doesn't affect CPU hotplug. Don't worry
about it.

Now, in ACPI-based system, Linux gets the number of possible CPUs by
MADT, We are going to use the ACPI namespace to make the number
accurate. But it is so hard, because it's so late to initialize the ACPI
namespace.

Thanks,
dou


of a way to tell, by just looking at the MADT, which of the two cases
apply.







Jan










Re: recent patch "x86/acpi: Prevent X2APIC id 0xffffffff from being accounted"

2018-04-27 Thread Dou Liyang

Hi Jan,

At 04/27/2018 03:21 PM, Jan Beulich wrote:

Hello,

I've just stumbled across this commit, and I'm wondering if that's actually
correct (I too have at least one system where such IDs are reported in
MADT): For offline/absent CPUs, the firmware may not know the APIC IDs


The MADT table is not reasonable, the Local APIC entries(xAPIC/x2APIC)
in MADT is not always correct as OS want. So, we should do this sanity
check.


at the point MADT is built, so I think it is quite reasonable to put ~0 in
there. The ACPID spec specifically calls out that the IDs must not change
across sleep states, which implies to me that they may change across an
offline period of a CPU. IOW I think such entries still need to contribute to
the count of disabled CPUs.



Aha, there are no CPU devices which will map it's physical ID to the
illegal number 0x, So it will never be used.

BTW, Limiting the number of the disabled CPUs is also a goal.

Thanks,
dou


I notice a similar change has been done for the xAPIC case a while ago
by you, Thomas.

Jan










[tip:x86/urgent] x86/vector: Remove the unused macro FPU_IRQ

2018-04-26 Thread tip-bot for Dou Liyang
Commit-ID:  7d878817db22f64c2b2c241335ec03e4c3fd5476
Gitweb: https://git.kernel.org/tip/7d878817db22f64c2b2c241335ec03e4c3fd5476
Author: Dou Liyang 
AuthorDate: Thu, 26 Apr 2018 14:08:32 +0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 26 Apr 2018 11:57:57 +0200

x86/vector: Remove the unused macro FPU_IRQ

The macro FPU_IRQ has never been used since v3.10, So remove it.

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180426060832.27312-1-douly.f...@cn.fujitsu.com

---
 arch/x86/include/asm/irq_vectors.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 57003074bd7a..548d90bbf919 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -114,8 +114,6 @@
 #define FIRST_SYSTEM_VECTORNR_VECTORS
 #endif
 
-#define FPU_IRQ  13
-
 /*
  * Size the maximum number of interrupts.
  *


[PATCH] x86/vector: Remove the unused macro FPU_IRQ

2018-04-25 Thread Dou Liyang
The macro FPU_IRQ has never been used since v3.10, So remove it.

Signed-off-by: Dou Liyang 
---
 arch/x86/include/asm/irq_vectors.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 57003074bd7a..548d90bbf919 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -114,8 +114,6 @@
 #define FIRST_SYSTEM_VECTORNR_VECTORS
 #endif
 
-#define FPU_IRQ  13
-
 /*
  * Size the maximum number of interrupts.
  *
-- 
2.14.3





[PATCH] x86/idt: Simplify the idt_setup_apic_and_irq_gates()

2018-04-25 Thread Dou Liyang
The vectors between FIRST_SYSTEM_VECTOR and NR_VECTORS are special IRQ
vectors used by the SMP architecture. But, if X86_LOCAL_APIC=n, it will
not be used, and the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS.

idt_setup_apic_and_irq_gates() didn't notice that, which make the code
a little complex.

Remove the code of the X86_LOCAL_APIC=n case to simplify it.

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/idt.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 2c3a1b4294eb..8b4174890706 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -317,15 +317,16 @@ void __init idt_setup_apic_and_irq_gates(void)
set_intr_gate(i, entry);
}
 
-   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
+   /*
+* If X86_LOCAL_APIC=n, the FIRST_SYSTEM_VECTOR is equal to NR_VECTORS
+* Just consider the X86_LOCAL_APIC=y case
+*/
 #ifdef CONFIG_X86_LOCAL_APIC
+   for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
set_bit(i, system_vectors);
set_intr_gate(i, spurious_interrupt);
-#else
-   entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
-   set_intr_gate(i, entry);
-#endif
}
+#endif
 }
 
 /**
-- 
2.14.3





[tip:x86/urgent] x86/vector: Remove the macro VECTOR_OFFSET_START

2018-04-25 Thread tip-bot for Dou Liyang
Commit-ID:  e3072805c61167b85a30ceeef606620704db31f7
Gitweb: https://git.kernel.org/tip/e3072805c61167b85a30ceeef606620704db31f7
Author: Dou Liyang 
AuthorDate: Wed, 25 Apr 2018 10:05:53 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 26 Apr 2018 07:31:17 +0200

x86/vector: Remove the macro VECTOR_OFFSET_START

Now, Linux uses matrix allocator for vector assignment, the original
assignment code which used VECTOR_OFFSET_START has been removed.

So remove the stale macro as well.

Fixes: commit 69cde0004a4b ("x86/vector: Use matrix allocator for vector 
assignment")
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Acked-by: David Rientjes 
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180425020553.17210-1-douly.f...@cn.fujitsu.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/irq_vectors.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 404c5fdff859..57003074bd7a 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -34,11 +34,6 @@
  * (0x80 is the syscall vector, 0x30-0x3f are for ISA)
  */
 #define FIRST_EXTERNAL_VECTOR  0x20
-/*
- * We start allocating at 0x21 to spread out vectors evenly between
- * priority levels. (0x80 is the syscall vector)
- */
-#define VECTOR_OFFSET_START1
 
 /*
  * Reserve the lowest usable vector (and hence lowest priority)  0x20 for


[tip:x86/urgent] x86/vector: Remove the macro VECTOR_OFFSET_START

2018-04-25 Thread tip-bot for Dou Liyang
Commit-ID:  5a626a8dfb58a64a39f4351e3962e7320191f189
Gitweb: https://git.kernel.org/tip/5a626a8dfb58a64a39f4351e3962e7320191f189
Author: Dou Liyang 
AuthorDate: Wed, 25 Apr 2018 10:05:53 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 25 Apr 2018 10:56:24 +0200

x86/vector: Remove the macro VECTOR_OFFSET_START

Now, Linux uses matrix allocator for vector assignment, the original
assignment code which used VECTOR_OFFSET_START has been removed.

So remove the stale macro as well.

Fixes: commit 69cde0004a4b ("x86/vector: Use matrix allocator for vector 
assignment")
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Acked-by: David Rientjes 
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180425020553.17210-1-douly.f...@cn.fujitsu.com

---
 arch/x86/include/asm/irq_vectors.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 404c5fdff859..57003074bd7a 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -34,11 +34,6 @@
  * (0x80 is the syscall vector, 0x30-0x3f are for ISA)
  */
 #define FIRST_EXTERNAL_VECTOR  0x20
-/*
- * We start allocating at 0x21 to spread out vectors evenly between
- * priority levels. (0x80 is the syscall vector)
- */
-#define VECTOR_OFFSET_START1
 
 /*
  * Reserve the lowest usable vector (and hence lowest priority)  0x20 for


[PATCH] x86/vector: Remove the macro VECTOR_OFFSET_START

2018-04-24 Thread Dou Liyang
Now, Linux uses matrix allocator for vector assignment, the original
assignment code which used VECTOR_OFFSET_START has been removed.

So remove the stale macro as well

Signed-off-by: Dou Liyang 
---
 arch/x86/include/asm/irq_vectors.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 404c5fdff859..57003074bd7a 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -34,11 +34,6 @@
  * (0x80 is the syscall vector, 0x30-0x3f are for ISA)
  */
 #define FIRST_EXTERNAL_VECTOR  0x20
-/*
- * We start allocating at 0x21 to spread out vectors evenly between
- * priority levels. (0x80 is the syscall vector)
- */
-#define VECTOR_OFFSET_START1
 
 /*
  * Reserve the lowest usable vector (and hence lowest priority)  0x20 for
-- 
2.14.3





Re: [RESEND PATCH] x86/boot/KASLR: Extend movable_node option for KASLR

2018-04-18 Thread Dou Liyang

Hi Ingo,

Any comments about that?

Now, When users want to support node hotplug with KASLR, they use
'mem=' to restrict the boot-up memory to the first node memory size.
If we want to boot up some hotpluggable node, their memory can't be
shown.

IMO, only few machines can support physical NUMA Node hotplug, and
we can't get memory hotplug info from ACPI SRAT earlier now(If we can do
that, we even can remove the 'movable_node' option).

So, IMO, extend movable_node to replace the misuse of 'mem' option.

Thought?

Thanks,

    dou

At 04/03/2018 11:36 AM, Dou Liyang wrote:

The movable_node option is a boot-time switch to make sure the physical
NUMA nodes can be hot-added/removed when ACPI table can't be parsed to
provide the memory hotplug information.

As we all know, there is always one node, called "home node", which
can't be movabled and the kernel image resides in it. With movable_node
option, Linux allocates new early memorys near the kernel image to avoid
using the other movable node.

But, due to KASLR also can't get the the memory hotplug information, it may
randomize the kernel image into a movable node which breaks the rule of
movable_node option and makes the physical hot-add/remove operation failed.

The perfect solution is providing the memory hotplug information to KASLR.
But, it needs the efforts from hardware engineers and software engineers.

Here is an alternative method. Extend movable_node option to restrict kernel
to be randomized in the home node by adding a parameter. this parameter sets
up the boundaries between the home nodes and other nodes.

Reported-by: Chao Fan 
Signed-off-by: Dou Liyang 
Reviewed-by: Kees Cook 
---
Changelog:
   -Rewrite the commit log and document.

  Documentation/admin-guide/kernel-parameters.txt | 12 ++--
  arch/x86/boot/compressed/kaslr.c| 19 ---
  2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1d1d53f85ddd..0cfc0b10a117 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2353,7 +2353,8 @@
mousedev.yres=  [MOUSE] Vertical screen resolution, used for devices
reporting absolute coordinates, such as tablets
  
-	movablecore=nn[KMG]	[KNL,X86,IA-64,PPC] This parameter

+   movablecore=nn[KMG]
+   [KNL,X86,IA-64,PPC] This parameter
is similar to kernelcore except it specifies the
amount of memory used for migratable allocations.
If both kernelcore and movablecore is specified,
@@ -2363,12 +2364,19 @@
that the amount of memory usable for all allocations
is not too small.
  
-	movable_node	[KNL] Boot-time switch to make hotplugable memory

+   movable_node[KNL] Boot-time switch to make hot-pluggable memory
NUMA nodes to be movable. This means that the memory
of such nodes will be usable only for movable
allocations which rules out almost all kernel
allocations. Use with caution!
  
+	movable_node=nn[KMG]

+   [KNL] Extend movable_node to make it work well with 
KASLR.
+   This parameter is the boundaries between the "home 
node" and
+   the other nodes. The "home node" is an immovable node 
and is
+   defined by BIOS. Set the 'nn' to the memory size of 
"home
+   node", the kernel image will be extracted in immovable 
nodes.
+
MTD_Partition=  [MTD]
Format: ,,,
  
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c

index 8199a6187251..f906d7890e69 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -92,7 +92,10 @@ struct mem_vector {
  static bool memmap_too_large;
  
  
-/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */

+/*
+ * Store memory limit specified by the following situations:
+ * "mem=nn[KMG]" or "memmap=nn[KMG]" or "movable_node=nn[KMG]"
+ */
  unsigned long long mem_limit = ULLONG_MAX;
  
  
@@ -214,7 +217,8 @@ static int handle_mem_memmap(void)

char *param, *val;
u64 mem_size;
  
-	if (!strstr(args, "memmap=") && !strstr(args, "mem="))

+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;
  
  	tmp_cmdline = malloc(len + 1);

@@ -249,7 +253,16 @@ static int handle_mem_memmap(void)
 

[tip:x86/urgent] x86/processor: Remove two unused function declarations

2018-04-17 Thread tip-bot for Dou Liyang
Commit-ID:  451cf3ca7d4615631443014ee769c25e267c25ff
Gitweb: https://git.kernel.org/tip/451cf3ca7d4615631443014ee769c25e267c25ff
Author: Dou Liyang 
AuthorDate: Wed, 4 Apr 2018 14:45:27 +0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 17 Apr 2018 11:56:32 +0200

x86/processor: Remove two unused function declarations

early_trap_init() and cpu_set_gdt() have been removed, so remove the stale
declarations as well.

Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: keesc...@chromium.org
Cc: l...@kernel.org
Cc: h...@zytor.com
Cc: b...@suse.de
Cc: kirill.shute...@linux.intel.com
Link: https://lkml.kernel.org/r/20180404064527.10562-1-douly.f...@cn.fujitsu.com

---
 arch/x86/include/asm/processor.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4fa4206029e3..21a114914ba4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -749,13 +749,11 @@ enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, 
IDLE_NOMWAIT,
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
-extern void early_trap_init(void);
 void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr early_gdt_descr;
 
-extern void cpu_set_gdt(int);
 extern void switch_to_new_gdt(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);


[tip:x86/urgent] x86/acpi: Prevent X2APIC id 0xffffffff from being accounted

2018-04-17 Thread tip-bot for Dou Liyang
Commit-ID:  10daf10ab154e31237a8c07242be3063fb6a9bf4
Gitweb: https://git.kernel.org/tip/10daf10ab154e31237a8c07242be3063fb6a9bf4
Author: Dou Liyang 
AuthorDate: Thu, 12 Apr 2018 09:40:52 +0800
Committer:  Thomas Gleixner 
CommitDate: Tue, 17 Apr 2018 11:56:31 +0200

x86/acpi: Prevent X2APIC id 0x from being accounted

RongQing reported that there are some X2APIC id 0x in his machine's
ACPI MADT table, which makes the number of possible CPU inaccurate.

The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
0x, which is an invalid id in all APIC types. See "Intel® 64
Architecture x2APIC Specification", Chapter 2.4.1.

Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.

Reported-by: Li RongQing 
Signed-off-by: Dou Liyang 
Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Cc: len.br...@intel.com
Cc: r...@rjwysocki.net
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20180412014052.25186-1-douly.f...@cn.fujitsu.com

---
 arch/x86/kernel/acpi/boot.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index dde444f932c1..3b20607d581b 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -215,6 +215,10 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
apic_id = processor->local_apic_id;
enabled = processor->lapic_flags & ACPI_MADT_ENABLED;
 
+   /* Ignore invalid ID */
+   if (apic_id == 0x)
+   return 0;
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size


[PATCH] x86/acpi: Prevent X2APIC id 0xffffffff from being accounted

2018-04-11 Thread Dou Liyang
RongQing reported that there are some X2APIC id 0x in his machine's
ACPI MADT table, which made the number of possible CPU inaccuracy.

The reason is that the ACPI X2APIC parser has no sanity check for
apicid 0x, which is an invalid id in all APIC types. See
"Intel® 64 Architecture x2APIC Specification", Chapter 2.4.1.

Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.

Reported-by: Li RongQing 
Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/acpi/boot.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index dde444f932c1..3b20607d581b 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -215,6 +215,10 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
apic_id = processor->local_apic_id;
enabled = processor->lapic_flags & ACPI_MADT_ENABLED;
 
+   /* Ignore invalid ID */
+   if (apic_id == 0x)
+   return 0;
+
/*
 * We need to register disabled CPU as well to permit
 * counting disabled CPUs. This allows us to size
-- 
2.14.3





Re: [tip:x86/urgent] x86/apic: Fix signedness bug in APIC ID validity checks

2018-04-11 Thread Dou Liyang

Hi Thomas,

At 04/10/2018 10:51 PM, tip-bot for Li RongQing wrote:
[...]


x86/apic: Fix signedness bug in APIC ID validity checks

The APIC ID as parsed from ACPI MADT is validity checked with the
apic->apic_id_valid() callback, which depends on the selected APIC type.

For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFF


Today when I am reading "Intel® 64 Architecture x2APIC Specification", I
find that below in chapter 2.4.1:

  The APIC ID value of _H and the highest value corresponding to
  the imple-mented bit-width of the local APIC ID register in the system
  are reserved and cannot be assigned to any logical processor.

Seems, _H is also invalid for X2APIC types, Shall we also do the
validity check for X2APIC ID?

acpi_parse_x2apic()
...
/* Ignore invalid ID */
if (apic_id == 0x)
return 0;
...

Thanks,
dou





Re: 答复: 答复: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted

2018-04-09 Thread Dou Liyang

Hi Rongqing,

At 04/09/2018 05:33 PM, Li,Rongqing wrote:
[...]

Thanks. How about the below modification:



It is good to me. you need start a new v2 thread for this. so
that maintainers can easy to get it.


commit 96ba42cf87ce0e62d54c01bfa9a9479b2e87
Author: Li RongQing 
Date:   Sun Apr 8 18:54:10 2018 +0800

 x86/acpi: Prevent all the X2APIC Id from being parsed in non-x2apic mode
 
 the values of x2APIC ID is greater than 0xff in ACPI MADT, if acpi

 is apic_flat, default_apic_id_valid() is called to check id which
 is converted from u32 to int, will return true if id is larger than
 0x7fff, this is wrong
 
 and if local_apic_id is invalid, we should prevent it from being

 accounted
 
 This fixes a bug that Purley platform displays too many possible cpu
 
 Signed-off-by: Li RongQing 

 Suggested-by:: Dou Liyang 

       ^---^ no need to add this, it is my pleasure


Thanks,

dou


 Cc: Peter Zijlstra 
 Cc: Thomas Gleixner 

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 40a3d3642f3a..08acd954f00e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -313,7 +313,7 @@ struct apic {
 /* Probe, setup and smpboot functions */
 int (*probe)(void);
 int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
-   int (*apic_id_valid)(int apicid);
+   int (*apic_id_valid)(u32 apicid);
 int (*apic_id_registered)(void);
  
 bool(*check_apicid_used)(physid_mask_t *map, int apicid);

@@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void)
 return apic->get_apic_id(reg);
  }
  
-extern int default_apic_id_valid(int apicid);

+extern int default_apic_id_valid(u32 apicid);
  extern int default_acpi_madt_oem_check(char *, char *);
  extern void default_setup_apic_routing(void);
  
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c

index 7a37d9357bc4..4ba949de1ca9 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
  {
 struct acpi_madt_local_x2apic *processor = NULL;
  #ifdef CONFIG_X86_X2APIC
-   int apic_id;
+   u32 apic_id;
 u8 enabled;
  #endif
  
@@ -222,10 +222,13 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)

  * to not preallocating memory for all NR_CPUS
  * when we use CPU hotplug.
  */
-   if (!apic->apic_id_valid(apic_id) && enabled)
-   printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
-   else
-   acpi_register_lapic(apic_id, processor->uid, enabled);
+   if (!apic->apic_id_valid(apic_id)) {
+   if (enabled)
+   printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
+   return 0;
+   }
+
+   acpi_register_lapic(apic_id, processor->uid, enabled);
  #else
 printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
  #endif
diff --git a/arch/x86/kernel/apic/apic_common.c 
b/arch/x86/kernel/apic/apic_common.c
index a360801779ae..02b4839478b1 100644
--- a/arch/x86/kernel/apic/apic_common.c
+++ b/arch/x86/kernel/apic/apic_common.c
@@ -40,7 +40,7 @@ int default_check_phys_apicid_present(int phys_apicid)
 return physid_isset(phys_apicid, phys_cpu_present_map);
  }
  
-int default_apic_id_valid(int apicid)

+int default_apic_id_valid(u32 apicid)
  {
 return (apicid < 255);
  }
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index 134e04506ab4..78778b54f904 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigned int id)
 return id << 24;
  }
  
-static int numachip_apic_id_valid(int apicid)

+static int numachip_apic_id_valid(u32 apicid)
  {
 /* Trust what bootloader passes in MADT */
 return 1;
diff --git a/arch/x86/kernel/apic/x2apic.h b/arch/x86/kernel/apic/x2apic.h
index b107de381cb5..a49b3604027f 100644
--- a/arch/x86/kernel/apic/x2apic.h
+++ b/arch/x86/kernel/apic/x2apic.h
@@ -1,6 +1,6 @@
  /* Common bits for X2APIC cluster/physical modes. */
  
-int x2apic_apic_id_valid(int apicid);

+int x2apic_apic_id_valid(u32 apicid);
  int x2apic_apic_id_registered(void);
  void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int 
dest);
  unsigned int x2apic_get_apic_id(unsigned long id);
diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index e2829bf40e4a..b5cf9e7b3830 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -101,7 +101,7 @@ static int x2apic_phys_probe(void)
  }
  
  /* Common x2apic functions, also used by

Re: 答复: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted

2018-04-09 Thread Dou Liyang

RongQing,

At 04/09/2018 02:38 PM, Li,Rongqing wrote:




-邮件原件-
发件人: Dou Liyang [mailto:douly.f...@cn.fujitsu.com]
发送时间: 2018年4月9日 13:38
收件人: Li,Rongqing ; linux-kernel@vger.kernel.org;
t...@linutronix.de; mi...@redhat.com; h...@zytor.com; jgr...@suse.com;
x...@kernel.org; pet...@infradead.org
主题: Re: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted

Hi RongQing,

Is there an local x2apic whose ID is  in your machine?



I think no


[...]


[0.00] ACPI: X2APIC (apic_id[0x] uid[0x00] disabled)
[0.00] ACPI: X2APIC (apic_id[0x] uid[0x01] disabled)
[0.00] ACPI: X2APIC (apic_id[0x] uid[0x02] disabled)


Ah, sure enough!

[...]




At 04/08/2018 07:38 PM, Li RongQing wrote:

local_apic_id of acpi_madt_local_x2apic is u32, it is converted to int
when checked by default_apic_id_valid() and return true if it is
larger than 0x7fff, this is wrong



For x2apic enabled systems,

- the byte length of X2APIC ID is 4, and it can be larger than
  0x7fff in theory



Yes


- the ->apic_id_valid points to x2apic_apic_id_valid(), which always
  return _ture_ , not default_apic_id_valid().



To this machine, the apic is apic_flat


I see, I am sorry the title and changelog make me misunderstand.

  Here, actually, we prevent all the X2APIC Id from being parsed in
  non-x2apic mode, not just 0x. because the values of x2APIC ID
  must be 255 and greater in ACPI MADT.



-RongQing


Thanks,
dou


and if local_apic_id is invalid, we should prevent it from being
accounted > This fixes a bug that Purley platform displays too many
possible cpu

Signed-off-by: Li RongQing 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Dou Liyang 
---
   arch/x86/include/asm/apic.h  |  4 ++--
   arch/x86/kernel/acpi/boot.c  | 10 ++
   arch/x86/kernel/apic/apic_common.c   |  2 +-
   arch/x86/kernel/apic/apic_numachip.c |  2 +-
   arch/x86/kernel/apic/x2apic.h|  2 +-
   arch/x86/kernel/apic/x2apic_phys.c   |  2 +-
   arch/x86/kernel/apic/x2apic_uv_x.c   |  2 +-
   arch/x86/xen/apic.c  |  2 +-
   8 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 40a3d3642f3a..08acd954f00e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -313,7 +313,7 @@ struct apic {
/* Probe, setup and smpboot functions */
int (*probe)(void);
int (*acpi_madt_oem_check)(char *oem_id, char

*oem_table_id);

-   int (*apic_id_valid)(int apicid);
+   int (*apic_id_valid)(u32 apicid);
int (*apic_id_registered)(void);

bool(*check_apicid_used)(physid_mask_t *map, int apicid);
@@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void)
return apic->get_apic_id(reg);
   }

-extern int default_apic_id_valid(int apicid);
+extern int default_apic_id_valid(u32 apicid);
   extern int default_acpi_madt_oem_check(char *, char *);
   extern void default_setup_apic_routing(void);

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7a37d9357bc4..7412564dc2a7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header

*header, const unsigned long end)

   {
struct acpi_madt_local_x2apic *processor = NULL;
   #ifdef CONFIG_X86_X2APIC
-   int apic_id;
+   u32 apic_id;
u8 enabled;
   #endif

@@ -222,10 +222,12 @@ acpi_parse_x2apic(struct acpi_subtable_header

*header, const unsigned long end)

 * to not preallocating memory for all NR_CPUS
 * when we use CPU hotplug.
 */
-   if (!apic->apic_id_valid(apic_id) && enabled)
+   if (!apic->apic_id_valid(apic_id)) {
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");


Due to the disable APIC entries may not exist physically, please just
printk the warning if the APIC ID is enabled.

if (!apic->apic_id_valid(apic_id)) {
if(enabled)
printk...

return 0;
}



Thanks,
dou




Re: [PATCH] genirq: only scan the present CPUs

2018-04-08 Thread Dou Liyang

Hi Peter,

At 04/06/2018 05:05 PM, Peter Zijlstra wrote:

On Fri, Apr 06, 2018 at 11:02:28AM +0200, Peter Zijlstra wrote:

On Fri, Apr 06, 2018 at 04:42:14PM +0800, Dou Liyang wrote:

Hi Thomas, Peter,

At 04/03/2018 07:23 PM, Peter Zijlstra wrote:

On Tue, Apr 03, 2018 at 12:25:56PM +0200, Thomas Gleixner wrote:

On Mon, 2 Apr 2018, Li RongQing wrote:


lots of application will read /proc/stat, like ps and vmstat, but we
find the reading time are spreading on Purley platform which has lots
of possible CPUs and interrupt.

To reduce the reading time, only scan the present CPUs, not all possible
CPUs, which speeds the reading of /proc/stat 20 times on Purley platform
which has 56 present CPUs, and 224 possible CPUs


Why is BIOS/ACPI telling the kernel that there are 224 possible CPUs unless
it supports physical CPU hotplug.


BIOS is crap, news at 11. I've got boxes like that too. Use
possible_cpu=$nr if you're bothered by it -- it's what I do.



Yes, I think so. it is a manual way to reset the number.

For this situation, I am investigating to restrict the number of
possible CPUs automatically, But, due to the limitation of ACPI
subsystem, I can do it _before_ setup_percpu_area where the number will
be used.


Ah, did you mean to day "I can _NOT_ do it" ? Still I don't see the

^--- Oops, yes.


point of frobbing random users if the whole thing is buggered.



If ACPI subsystem can be initialized earlier, we can get the accurate
number of possible CPUs from the ACPI namespace. then, we can reset the
_cpu_possible_mask_ as the prefill_possible_map() does. So, it can
forbid random users.

But, It needs the memory to be initialized first, so it can't be called 
earlier setup_percpu_area() which is evoked earlier than mem_init().


and you are right:

"So if you see it enumerates a gazillion empty spots but the system does
 not in fact support physical hotplug, we should discard those."

I will think it more carefully.

Thanks,

dou








Re: [RFC PATCH] x86/acpi: Prevent x2apic id -1 from being accounted

2018-04-08 Thread Dou Liyang

Hi RongQing,

Is there an local x2apic whose ID is 0x in your machine?

At 04/08/2018 07:38 PM, Li RongQing wrote:

local_apic_id of acpi_madt_local_x2apic is u32, it is converted to
int when checked by default_apic_id_valid() and return true if it is
larger than 0x7fff, this is wrong



For x2apic enabled systems,

  - the byte length of X2APIC ID is 4, and it can be larger than
0x7fff in theory

  - the ->apic_id_valid points to x2apic_apic_id_valid(), which always
return _ture_ , not default_apic_id_valid().

Thanks,
dou


and if local_apic_id is invalid, we should prevent it from being
accounted >
This fixes a bug that Purley platform displays too many possible cpu

Signed-off-by: Li RongQing 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Dou Liyang 
---
  arch/x86/include/asm/apic.h  |  4 ++--
  arch/x86/kernel/acpi/boot.c  | 10 ++
  arch/x86/kernel/apic/apic_common.c   |  2 +-
  arch/x86/kernel/apic/apic_numachip.c |  2 +-
  arch/x86/kernel/apic/x2apic.h|  2 +-
  arch/x86/kernel/apic/x2apic_phys.c   |  2 +-
  arch/x86/kernel/apic/x2apic_uv_x.c   |  2 +-
  arch/x86/xen/apic.c  |  2 +-
  8 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 40a3d3642f3a..08acd954f00e 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -313,7 +313,7 @@ struct apic {
/* Probe, setup and smpboot functions */
int (*probe)(void);
int (*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
-   int (*apic_id_valid)(int apicid);
+   int (*apic_id_valid)(u32 apicid);
int (*apic_id_registered)(void);
  
  	bool	(*check_apicid_used)(physid_mask_t *map, int apicid);

@@ -486,7 +486,7 @@ static inline unsigned int read_apic_id(void)
return apic->get_apic_id(reg);
  }
  
-extern int default_apic_id_valid(int apicid);

+extern int default_apic_id_valid(u32 apicid);
  extern int default_acpi_madt_oem_check(char *, char *);
  extern void default_setup_apic_routing(void);
  
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c

index 7a37d9357bc4..7412564dc2a7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -200,7 +200,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, 
const unsigned long end)
  {
struct acpi_madt_local_x2apic *processor = NULL;
  #ifdef CONFIG_X86_X2APIC
-   int apic_id;
+   u32 apic_id;
u8 enabled;
  #endif
  
@@ -222,10 +222,12 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)

 * to not preallocating memory for all NR_CPUS
 * when we use CPU hotplug.
 */
-   if (!apic->apic_id_valid(apic_id) && enabled)
+   if (!apic->apic_id_valid(apic_id)) {
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
-   else
-   acpi_register_lapic(apic_id, processor->uid, enabled);
+   return 0;
+   }
+
+   acpi_register_lapic(apic_id, processor->uid, enabled);
  #else
printk(KERN_WARNING PREFIX "x2apic entry ignored\n");
  #endif
diff --git a/arch/x86/kernel/apic/apic_common.c 
b/arch/x86/kernel/apic/apic_common.c
index a360801779ae..02b4839478b1 100644
--- a/arch/x86/kernel/apic/apic_common.c
+++ b/arch/x86/kernel/apic/apic_common.c
@@ -40,7 +40,7 @@ int default_check_phys_apicid_present(int phys_apicid)
return physid_isset(phys_apicid, phys_cpu_present_map);
  }
  
-int default_apic_id_valid(int apicid)

+int default_apic_id_valid(u32 apicid)
  {
return (apicid < 255);
  }
diff --git a/arch/x86/kernel/apic/apic_numachip.c 
b/arch/x86/kernel/apic/apic_numachip.c
index 134e04506ab4..78778b54f904 100644
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigned int id)
return id << 24;
  }
  
-static int numachip_apic_id_valid(int apicid)

+static int numachip_apic_id_valid(u32 apicid)
  {
/* Trust what bootloader passes in MADT */
return 1;
diff --git a/arch/x86/kernel/apic/x2apic.h b/arch/x86/kernel/apic/x2apic.h
index b107de381cb5..a49b3604027f 100644
--- a/arch/x86/kernel/apic/x2apic.h
+++ b/arch/x86/kernel/apic/x2apic.h
@@ -1,6 +1,6 @@
  /* Common bits for X2APIC cluster/physical modes. */
  
-int x2apic_apic_id_valid(int apicid);

+int x2apic_apic_id_valid(u32 apicid);
  int x2apic_apic_id_registered(void);
  void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int 
dest);
  unsigned int x2apic_get_apic_id(unsigned long id);
diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index e2829bf40e4a..b5cf9e7b3830 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -101,7 +101,7 @@ static int x2apic_phys_

Re: [PATCH] genirq: only scan the present CPUs

2018-04-06 Thread Dou Liyang

Hi Thomas, Peter,

At 04/03/2018 07:23 PM, Peter Zijlstra wrote:

On Tue, Apr 03, 2018 at 12:25:56PM +0200, Thomas Gleixner wrote:

On Mon, 2 Apr 2018, Li RongQing wrote:


lots of application will read /proc/stat, like ps and vmstat, but we
find the reading time are spreading on Purley platform which has lots
of possible CPUs and interrupt.

To reduce the reading time, only scan the present CPUs, not all possible
CPUs, which speeds the reading of /proc/stat 20 times on Purley platform
which has 56 present CPUs, and 224 possible CPUs


Why is BIOS/ACPI telling the kernel that there are 224 possible CPUs unless
it supports physical CPU hotplug.


BIOS is crap, news at 11. I've got boxes like that too. Use
possible_cpu=$nr if you're bothered by it -- it's what I do.



Yes, I think so. it is a manual way to reset the number.

For this situation, I am investigating to restrict the number of
possible CPUs automatically, But, due to the limitation of ACPI
subsystem, I can do it _before_ setup_percpu_area where the number will
be used.

But, I can provider an indicator to tell the system that whether the 
physical CPU hotplug is support or not later. Can we use this indicator

like that in this situation:

   if ture

Using for_each_possible_cpu(cpu)
   else

Using for_each_present_cpu(cpu) 



Thanks,

dou









[PATCH] x86/processor: Remove two legacy function declaration

2018-04-03 Thread Dou Liyang
the early_trap_init() and cpu_set_gdt() have been abandoned, so remove
them.

Signed-off-by: Dou Liyang 
---
 arch/x86/include/asm/processor.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b0ccd4847a58..3ef3221107c3 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -739,13 +739,11 @@ enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, 
IDLE_NOMWAIT,
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
 
-extern void early_trap_init(void);
 void early_trap_pf_init(void);
 
 /* Defined in head.S */
 extern struct desc_ptr early_gdt_descr;
 
-extern void cpu_set_gdt(int);
 extern void switch_to_new_gdt(int);
 extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
-- 
2.14.3





[RESEND PATCH] x86/boot/KASLR: Extend movable_node option for KASLR

2018-04-02 Thread Dou Liyang
The movable_node option is a boot-time switch to make sure the physical
NUMA nodes can be hot-added/removed when ACPI table can't be parsed to
provide the memory hotplug information.

As we all know, there is always one node, called "home node", which
can't be movabled and the kernel image resides in it. With movable_node
option, Linux allocates new early memorys near the kernel image to avoid
using the other movable node.

But, due to KASLR also can't get the the memory hotplug information, it may
randomize the kernel image into a movable node which breaks the rule of
movable_node option and makes the physical hot-add/remove operation failed.

The perfect solution is providing the memory hotplug information to KASLR.
But, it needs the efforts from hardware engineers and software engineers.

Here is an alternative method. Extend movable_node option to restrict kernel
to be randomized in the home node by adding a parameter. this parameter sets
up the boundaries between the home nodes and other nodes.

Reported-by: Chao Fan 
Signed-off-by: Dou Liyang 
Reviewed-by: Kees Cook 
---
Changelog:
  -Rewrite the commit log and document.

 Documentation/admin-guide/kernel-parameters.txt | 12 ++--
 arch/x86/boot/compressed/kaslr.c| 19 ---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1d1d53f85ddd..0cfc0b10a117 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2353,7 +2353,8 @@
mousedev.yres=  [MOUSE] Vertical screen resolution, used for devices
reporting absolute coordinates, such as tablets
 
-   movablecore=nn[KMG] [KNL,X86,IA-64,PPC] This parameter
+   movablecore=nn[KMG]
+   [KNL,X86,IA-64,PPC] This parameter
is similar to kernelcore except it specifies the
amount of memory used for migratable allocations.
If both kernelcore and movablecore is specified,
@@ -2363,12 +2364,19 @@
that the amount of memory usable for all allocations
is not too small.
 
-   movable_node[KNL] Boot-time switch to make hotplugable memory
+   movable_node[KNL] Boot-time switch to make hot-pluggable memory
NUMA nodes to be movable. This means that the memory
of such nodes will be usable only for movable
allocations which rules out almost all kernel
allocations. Use with caution!
 
+   movable_node=nn[KMG]
+   [KNL] Extend movable_node to make it work well with 
KASLR.
+   This parameter is the boundaries between the "home 
node" and
+   the other nodes. The "home node" is an immovable node 
and is
+   defined by BIOS. Set the 'nn' to the memory size of 
"home
+   node", the kernel image will be extracted in immovable 
nodes.
+
MTD_Partition=  [MTD]
Format: ,,,
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8199a6187251..f906d7890e69 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -92,7 +92,10 @@ struct mem_vector {
 static bool memmap_too_large;
 
 
-/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
+/*
+ * Store memory limit specified by the following situations:
+ * "mem=nn[KMG]" or "memmap=nn[KMG]" or "movable_node=nn[KMG]"
+ */
 unsigned long long mem_limit = ULLONG_MAX;
 
 
@@ -214,7 +217,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;
 
-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "movable_node="))
return 0;
 
tmp_cmdline = malloc(len + 1);
@@ -249,7 +253,16 @@ static int handle_mem_memmap(void)
free(tmp_cmdline);
return -EINVAL;
}
-   mem_limit = mem_size;
+   mem_limit = mem_limit > mem_size ? mem_size : mem_limit;
+   } else if (!strcmp(param, "movable_node")) {
+   char *p = val;
+
+   mem_size = memparse(p, &p);
+   if (mem_size == 0) {
+   free(tmp_cmdline);
+   return -EINVAL;
+   }
+   mem_limit = mem_limit > mem_size ? mem_size : mem_limit;
}
}
 
-- 
2.14.3





Re: [PATCH v9 0/5] x86/KASLR: Add parameter kaslr_boot_mem=nn[KMG]@ss[KMG]

2018-03-28 Thread Dou Liyang

Hi Ingo, Kees, Baoquan and Chao

At 03/12/2018 06:57 PM, Ingo Molnar wrote:
[...]

So there's apparently a mis-design here:

  - KASLR needs to be done very early on during bootup: - it's not realistic to
expect KASLR to be done with a booted up kernel, because pointers to various
KASLR-ed objects are already widely spread out in memory.

  - But for some unfathomable reason the memory hotplug attribute of memory
regions is not part of the regular memory map but part of late-init ACPI 
data
structures.

The right solution would be _not_ to fudge the KASLR location, but to provide 
the
memory hotplug information to early code, preferably via the primary memory map.
KASLR can then make use of it and avoid those regions, just like it avoids other
memory regions already.

In addition to that hardware makers (including virtualized hardware) should also
fix their systems to provide memory hotplug information to early code.



Yes, but before this, can we fix this problem by the following patch 
which has been sent and reviewed by Kees before[1]. its solution is:


  Extend movable_node option to restrict kernel to be randomized in
  immovable nodes by adding a parameter. this parameter sets up
  the boundaries between the home nodes and other nodes.

My reason is here:

  - What we really want to solve is the KASLR breaks *physical Node
    hotplug*, Keep the decompressed kernel in an immovable node is
    enough.

  - AFAICS, there are not too many systems where physical Node hotplug
actually works in practice, and there mush be one node called *home
    node* which is immovable for storing basic information.

  - the node in modern systems could have double-digit gigabytes memory,
    It can completely satisfy the operation of KASLR.

So, Just restrict kernel to be randomized in the home node, and ignore
other nodes when kernel has the *movable_node* option in the command
line.

Thoughts? may I rebase and resend the patch?

[1] https://lkml.org/lkml/2017/8/3/401

Thanks,

dou




Re: [PATCH 1/5] x86/smpboot: Add the missing description of possible_cpus

2018-03-21 Thread Dou Liyang



At 03/21/2018 05:34 PM, Dou Liyang wrote:

Hi Peter,

At 03/21/2018 05:08 PM, Peter Zijlstra wrote:

On Wed, Mar 21, 2018 at 01:33:24PM +0800, Dou Liyang wrote:

How about:

possible_cpus=    [s390,x86_64] Set the number of possible CPUs which
    are determined by the ACPI tables MADT or mptables by
    default. possible_cpus=n : n >= 1 enforces the possible
    number to be 'n'.
    While nr_cpus is also be set: nr_cpus=m, choice the
    minimum one for the number of possible CPUs.


So what is the exact difference between possible_cpus and nr_cpus ? I


the possible_cpus= can reset the number of possible CPUs, even bigger
than 'num_processors + disabled_cpus', But nr_cpus= can't.

   ^^^
  the maximum number kernel gets from ACPI/mptables, no matter what
number nr_cpus= is, the number of possible CPUs will not bigger than it.




konw maxcpus= limits the number of CPUs we bring up, and possible_cpus
limits the possible_map, but I'm not entirely sure what nr_cpus does
here.



nr_cpus can limited the maximum CPUs that the kernel could support.

Here is a double check in case of using them at the same time, even if I
think just using possible_cpus= is enough. :-)

Thanks,
 dou.








Re: [PATCH 1/5] x86/smpboot: Add the missing description of possible_cpus

2018-03-21 Thread Dou Liyang

Hi Peter,

At 03/21/2018 05:08 PM, Peter Zijlstra wrote:

On Wed, Mar 21, 2018 at 01:33:24PM +0800, Dou Liyang wrote:

How about:

possible_cpus=  [s390,x86_64] Set the number of possible CPUs which
are determined by the ACPI tables MADT or mptables by
default. possible_cpus=n : n >= 1 enforces the possible
number to be 'n'.
While nr_cpus is also be set: nr_cpus=m, choice the
minimum one for the number of possible CPUs.


So what is the exact difference between possible_cpus and nr_cpus ? I


the possible_cpus= can reset the number of possible CPUs, even bigger
than 'num_processors + disabled_cpus', But nr_cpus= can't.


konw maxcpus= limits the number of CPUs we bring up, and possible_cpus
limits the possible_map, but I'm not entirely sure what nr_cpus does
here.



nr_cpus can limited the maximum CPUs that the kernel could support.

Here is a double check in case of using them at the same time, even if I
think just using possible_cpus= is enough. :-)

Thanks,
dou.








Re: [PATCH 3/5] x86/smpboot: Make the check code more clear in prefill_possible_map()

2018-03-20 Thread Dou Liyang

Hi Peter,

At 03/20/2018 08:39 PM, Peter Zijlstra wrote:

On Tue, Mar 20, 2018 at 07:04:30PM +0800, Dou Liyang wrote:

case 1: no  | no  | no | -->  min (setup_possible_cpus, nr_cpu_ids, 
setup_max_cpus)
case 2: no  | no  | yes| -->  min (setup_possible_cpus, nr_cpu_ids)
case 3: no  | yes | no | -->  1
case 4: no  | yes | yes| -->  1
case 5: yes | no  | no | -->  min (num_processors, nr_cpu_ids, setup_max_cpus)
case 6: yes | no  | yes| -->  min (num_processors + disabled_cpus, nr_cpu_ids)
case 7: yes | yes | no | -->  1
case 8: yes | yes | yes| -->  1


The case number is off by one ;-)



I got it! ;-)

Thanks
dou








Re: [PATCH 1/5] x86/smpboot: Add the missing description of possible_cpus

2018-03-20 Thread Dou Liyang

Hi Peter,

At 03/20/2018 08:37 PM, Peter Zijlstra wrote:

On Tue, Mar 20, 2018 at 07:04:28PM +0800, Dou Liyang wrote:


+   possible_cpus=  [s390,x86_64] Use this to set hotpluggable cpus.
+   This option sets possible_cpus bits in cpu_possible_map.
+   Thus keeping the numbers of bits set constant even if
+   the machine gets rebooted.


That description, esp. the last sentence, doesn't make any kind of sense
to me. What?



Ah, sure enough, I can't be lazy. :-) I stole that from the commit
  3b11ce7f542e ("x86: use possible_cpus=NUM to extend the possible cpus 
allowed")


How about:

possible_cpus=  [s390,x86_64] Set the number of possible CPUs which
are determined by the ACPI tables MADT or mptables by
default. possible_cpus=n : n >= 1 enforces the possible
number to be 'n'.
While nr_cpus is also be set: nr_cpus=m, choice the
minimum one for the number of possible CPUs.

Thank,
dou




[PATCH 3/5] x86/smpboot: Make the check code more clear in prefill_possible_map()

2018-03-20 Thread Dou Liyang
In prefill_possible_map(), Kernel need to get the number of possible CPUs
to reset cpu_possible_map. The number is related to the options:

  -nosmp, maxcpus, possible_cpus, nr_cpus

... which need to be checked.

Currentlly, the check code mixed these options together in confusion and
hard to follow.

Isolate them, and check them linearly.

No functionary change, Prepare for cutting the number of possible CPUs

Signed-off-by: Dou Liyang 
---
 
Situations:

setup_possible_cpus == -1 |  setup_max_cpus == 0 |  CONFIG_HOTPLUG_CPU == y | 

yes   | yes  | yes  |
no| no   | no   |

Test cases:
   Cases  |  the number of possible cpus
  |   (the same with or w/o this patch)
case 1: no  | no  | no | -->  min (setup_possible_cpus, nr_cpu_ids, 
setup_max_cpus)
case 2: no  | no  | yes| -->  min (setup_possible_cpus, nr_cpu_ids)
case 3: no  | yes | no | -->  1
case 4: no  | yes | yes| -->  1
case 5: yes | no  | no | -->  min (num_processors, nr_cpu_ids, setup_max_cpus)
case 6: yes | no  | yes| -->  min (num_processors + disabled_cpus, nr_cpu_ids)
case 7: yes | yes | no | -->  1 
case 8: yes | yes | yes| -->  1

 arch/x86/kernel/smpboot.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ff99e2b6fc54..2fdda8567bf9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1334,7 +1334,7 @@ early_param("possible_cpus", _setup_possible_cpus);
  * We do this because additional CPUs waste a lot of memory.
  * -AK
  */
-__init void prefill_possible_map(void)
+void __init prefill_possible_map(void)
 {
int i, possible;
 
@@ -1356,18 +1356,22 @@ __init void prefill_possible_map(void)
num_processors = 1;
}
 
-   i = setup_max_cpus ?: 1;
+   /* The SMP kernel could be acted as an UP kernel via nosmp/maxcpus=0 */
+   if (!setup_max_cpus) {
+   possible = 1;
+   total_cpus = num_processors + disabled_cpus;
+   goto set_cpu_possible_mask;
+   }
+
+   /* Possible CPUs could be overwrited via possible_cpus= */
if (setup_possible_cpus == -1) {
possible = num_processors;
 #ifdef CONFIG_HOTPLUG_CPU
-   if (setup_max_cpus)
-   possible += disabled_cpus;
-#else
-   if (possible > i)
-   possible = i;
+   possible += disabled_cpus;
 #endif
-   } else
+   } else {
possible = setup_possible_cpus;
+   }
 
total_cpus = max_t(int, possible, num_processors + disabled_cpus);
 
@@ -1378,15 +1382,16 @@ __init void prefill_possible_map(void)
possible = nr_cpu_ids;
}
 
-#ifdef CONFIG_HOTPLUG_CPU
-   if (!setup_max_cpus)
-#endif
-   if (possible > i) {
+#ifndef CONFIG_HOTPLUG_CPU
+   /* Possible CPUs could be reduced via max_cpus= */
+   if (possible > setup_max_cpus) {
pr_warn("%d Processors exceeds max_cpus limit of %u\n",
possible, setup_max_cpus);
-   possible = i;
+   possible = setup_max_cpus;
}
+#endif
 
+set_cpu_possible_mask:
nr_cpu_ids = possible;
 
pr_info("Allowing %d CPUs, %d hotplug CPUs\n",
-- 
2.14.3





[PATCH 2/5] x86/cpu_hotplug: Update the link of cpu_hotplug.rst

2018-03-20 Thread Dou Liyang
The original cpu_hotplug.txt documents describing CPU hotplug support in
Linux kernel. it was moved in to core-api/ and renamed cpu_hotplug.rst.

Update it's link in other documents

Fixes: ff58fa7f556c ("Documentation: Update CPU hotplug and move it to 
core-api")
Signed-off-by: Dou Liyang 
---
 Documentation/00-INDEX|  2 --
 Documentation/cputopology.txt | 10 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 7f3a0728ccf2..3773c67ea9e5 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -104,8 +104,6 @@ core-api/
- documentation on kernel core components.
 cpu-freq/
- info on CPU frequency and voltage scaling.
-cpu-hotplug.txt
-   - document describing CPU hotplug support in the Linux kernel.
 cpu-load.txt
- document describing how CPU load statistics are collected.
 cpuidle/
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index c6e7e9196a8b..e05b0879fe91 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -117,9 +117,9 @@ source for the output is in brackets ("[]").
[NR_CPUS-1]
 
 offline:   CPUs that are not online because they have been
-   HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
-   of CPUs allowed by the kernel configuration (kernel_max
-   above). [~cpu_online_mask + cpus >= NR_CPUS]
+   HOTPLUGGED off (see core-api/cpu_hotplug.rst) or exceed
+   the limit of CPUs allowed by the kernel configuration
+   (kernel_max above). [~cpu_online_mask + cpus >= NR_CPUS]
 
 online:CPUs that are online and being scheduled [cpu_online_mask]
 
@@ -155,5 +155,5 @@ online.)::
possible: 0-127
 present: 0-3
 
-See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
-as well as more information on the various cpumasks.
+See core-api/cpu_hotplug.rst for the possible_cpus=NUM kernel start
+parameter as well as more information on the various cpumasks.
-- 
2.14.3





[PATCH 5/5] acpi/processor: Make the acpi_duplicate_processor_id() static

2018-03-20 Thread Dou Liyang
The acpi_duplicate_processor_id() is only called in acpi_processor_get_info(),
So move it in front of acpi_processor_get_info() and make it static.

Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c | 62 +--
 include/linux/acpi.h  |  3 ---
 2 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index db5bdb59639c..03ec7690710c 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -229,6 +229,37 @@ static inline int acpi_processor_hotadd_init(struct 
acpi_processor *pr)
 }
 #endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
+/* The number of the unique processor IDs */
+static int nr_unique_ids __initdata;
+
+/* The number of the duplicate processor IDs */
+static int nr_duplicate_ids;
+
+/* Used to store the unique processor IDs */
+static int unique_processor_ids[] __initdata = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+/* Used to store the duplicate processor IDs */
+static int duplicate_processor_ids[] = {
+   [0 ... NR_CPUS - 1] = -1,
+};
+
+static bool acpi_duplicate_processor_id(int proc_id)
+{
+   int i;
+
+   /*
+* Compare the proc_id with duplicate IDs, if the proc_id is already
+* in the duplicate IDs, return true, otherwise, return false.
+*/
+   for (i = 0; i < nr_duplicate_ids; i++) {
+   if (duplicate_processor_ids[i] == proc_id)
+   return true;
+   }
+   return false;
+}
+
 static int acpi_processor_get_info(struct acpi_device *device)
 {
union acpi_object object = { 0 };
@@ -579,22 +610,6 @@ static struct acpi_scan_handler 
processor_container_handler = {
.attach = acpi_processor_container_attach,
 };
 
-/* The number of the unique processor IDs */
-static int nr_unique_ids __initdata;
-
-/* The number of the duplicate processor IDs */
-static int nr_duplicate_ids;
-
-/* Used to store the unique processor IDs */
-static int unique_processor_ids[] __initdata = {
-   [0 ... NR_CPUS - 1] = -1,
-};
-
-/* Used to store the duplicate processor IDs */
-static int duplicate_processor_ids[] = {
-   [0 ... NR_CPUS - 1] = -1,
-};
-
 static void __init processor_validated_ids_update(int proc_id)
 {
int i;
@@ -682,21 +697,6 @@ static void __init acpi_processor_check_duplicates(void)
NULL, NULL);
 }
 
-bool acpi_duplicate_processor_id(int proc_id)
-{
-   int i;
-
-   /*
-* compare the proc_id with duplicate IDs, if the proc_id is already
-* in the duplicate IDs, return true, otherwise, return false.
-*/
-   for (i = 0; i < nr_duplicate_ids; i++) {
-   if (duplicate_processor_ids[i] == proc_id)
-   return true;
-   }
-   return false;
-}
-
 void __init acpi_processor_init(void)
 {
acpi_processor_check_duplicates();
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 968173ec2726..dd4591dc1eb3 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -285,9 +285,6 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
return phys_id == PHYS_CPUID_INVALID;
 }
 
-/* Validate the processor object's proc_id */
-bool acpi_duplicate_processor_id(int proc_id);
-
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
-- 
2.14.3





[PATCH 0/5] x86/cpu_hotplug: one bug fix and four cleanup

2018-03-20 Thread Dou Liyang
Recently, we hoped to make the possible CPU count more accurate for
Kernel. I stuck on the issue how do I run acpi_early_init() _before_ 
setup_percpu() is invoked. So send these insignificant patches first.

This patchset does this things:

  - two document-related work(the 1th and 2th patch),
  - two cleapup work (the 3th and 5th patch)
  - a bug fix for CPU hotplug(4th patch)

Dou Liyang (5):
  x86/smpboot: Add the missing description of possible_cpus
  x86/cpu_hotplug: Update the link of cpu_hotplug.rst
  x86/smpboot: Make the check code more clear in prefill_possible_map()
  acpi/processor: Fix the return value of acpi_processor_ids_walk()
  acpi/processor: Make the acpi_duplicate_processor_id() static

 Documentation/00-INDEX  |  2 -
 Documentation/admin-guide/kernel-parameters.txt |  5 ++
 Documentation/cputopology.txt   | 10 ++--
 Documentation/x86/x86_64/cpu-hotplug-spec   |  2 +-
 arch/x86/kernel/smpboot.c   | 31 +++-
 drivers/acpi/acpi_processor.c   | 66 -
 include/linux/acpi.h|  3 --
 7 files changed, 62 insertions(+), 57 deletions(-)

-- 
2.14.3





[PATCH 4/5] acpi/processor: Fix the return value of acpi_processor_ids_walk()

2018-03-20 Thread Dou Liyang
ACPI driver should make sure all the processor IDs in their ACPI Namespace
are unique for CPU hotplug. the driver performs a depth-first walk of the
namespace tree and calls the acpi_processor_ids_walk().

But, the acpi_processor_ids_walk() will return true if one processor is
checked, that cause the walk break after walking pass the first processor.

Repace the value with AE_OK which is the standard acpi_status value.

Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for processor 
enumeration")

Signed-off-by: Dou Liyang 
---
 drivers/acpi/acpi_processor.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..db5bdb59639c 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -663,11 +663,11 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,
}
 
processor_validated_ids_update(uid);
-   return true;
+   return AE_OK;
 
 err:
acpi_handle_info(handle, "Invalid processor object\n");
-   return false;
+   return AE_OK;
 
 }
 
-- 
2.14.3





[PATCH 1/5] x86/smpboot: Add the missing description of possible_cpus

2018-03-20 Thread Dou Liyang
Kernel uses the possible_cpus in command line to reset the possible_cpus bits
in cpu_possible_map. It doesn't be recorded in the kernel-parameters.txt

Add its description in this document, also replace the wrong option 
additional_cpus
with possible_cpus in cpu-gotplug-spec.

Signed-off-by: Dou Liyang 
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +
 Documentation/x86/x86_64/cpu-hotplug-spec   | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 1d1d53f85ddd..34f8a5f04e63 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2832,6 +2832,11 @@
variables need be pre-allocated for later physical cpu
hot plugging.
 
+   possible_cpus=  [s390,x86_64] Use this to set hotpluggable cpus.
+   This option sets possible_cpus bits in cpu_possible_map.
+   Thus keeping the numbers of bits set constant even if
+   the machine gets rebooted.
+
nr_uarts=   [SERIAL] maximum number of UARTs to be registered.
 
numa_balancing= [KNL,X86] Enable or disable automatic NUMA balancing.
diff --git a/Documentation/x86/x86_64/cpu-hotplug-spec 
b/Documentation/x86/x86_64/cpu-hotplug-spec
index 3c23e0587db3..d218bc0bdaaa 100644
--- a/Documentation/x86/x86_64/cpu-hotplug-spec
+++ b/Documentation/x86/x86_64/cpu-hotplug-spec
@@ -16,6 +16,6 @@ it should have its LAPIC Enabled bit set to 0. Linux will use 
the number
 of disabled LAPICs to compute the maximum number of future CPUs.
 
 In the worst case the user can overwrite this choice using a command line
-option (additional_cpus=...), but it is recommended to supply the correct
+option (possible_cpus=...), but it is recommended to supply the correct
 number (or a reasonable approximation of it, with erring towards more not less)
 in the MADT to avoid manual configuration.
-- 
2.14.3





Re: [ACPI / processor] d619c81e24: WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu

2018-03-18 Thread Dou Liyang

Hi lkp team

Thank you for testing.

At 03/16/2018 07:17 PM, kernel test robot wrote:

FYI, we noticed the following commit (built with gcc-7):

commit: d619c81e246424e322f7a902bed6e60b90668d56 ("ACPI / processor: Get accurate 
possible CPU count")
url: 
https://github.com/0day-ci/linux/commits/Dou-Liyang/ACPI-processor-Get-accurate-possible-CPU-count/20180316-140349


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 3266b5bd97 | 
d619c81e24 |
+--+++
| boot_successes   | 16 | 2 
 |
| boot_failures| 0  | 18
 |
| WARNING:at_include/linux/cpumask.h:#cpumask_test_cpu | 0  | 18
 |
| WARNING:at_include/linux/cpumask.h:#cpumask_check| 0  | 18
 |
| RIP:cpumask_check| 0  | 18
 |
| RIP:cpumask_test_cpu | 0  | 18
 |
| general_protection_fault:#[##]   | 0  | 13
 |
| RIP:__lock_acquire   | 0  | 13
 |
| Kernel_panic-not_syncing:Fatal_exception | 0  | 13
 |
| BUG:kernel_hang_in_boot_stage| 0  | 5 
 |
+--+++



[0.830741] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 
cpumask_test_cpu+0x32/0x57
[0.830785] WARNING: CPU: 1 PID: 1 at include/linux/cpumask.h:122 
cpumask_check+0x2d/0x48


Yes, this patch broke the number of CPUs, we will find out a new way,
please drop the test work of this patch.

Thanks,
dou




Re: [PATCH] kernel/cpu: Move cpuhp_is_atomic_state() into #ifdef CONFIG_SMP

2018-03-16 Thread Dou Liyang



At 03/16/2018 03:59 PM, Thomas Gleixner wrote:

On Fri, 16 Mar 2018, Dou Liyang wrote:


Commit 17a2f1ced028 ("cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states")
removed the last use of cpuhp_is_atomic_state() in common case, that caused
Kernel produced a warning:

   'cpuhp_is_ap_state' defined but not used

So, Move it into #ifdef CONFIG_SMP


Thanks for the patch, but a fix is already queued in the smp/hotplug
branch. It just did not make it into the for-next branch yet,



Oops, yes, let's ignore this patch.

Thanks,
dou


Thanks,

tglx








[PATCH] kernel/cpu: Move cpuhp_is_atomic_state() into #ifdef CONFIG_SMP

2018-03-15 Thread Dou Liyang
Commit 17a2f1ced028 ("cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states")
removed the last use of cpuhp_is_atomic_state() in common case, that caused
Kernel produced a warning:

  'cpuhp_is_ap_state' defined but not used

So, Move it into #ifdef CONFIG_SMP

Reported-by: Stephen Rothwell 
Signed-off-by: Dou Liyang 
---
 kernel/cpu.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a1860d42aacf..db7ec7572348 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -126,15 +126,6 @@ struct cpuhp_step {
 static DEFINE_MUTEX(cpuhp_state_mutex);
 static struct cpuhp_step cpuhp_hp_states[];
 
-static bool cpuhp_is_ap_state(enum cpuhp_state state)
-{
-   /*
-* The extra check for CPUHP_TEARDOWN_CPU is only for documentation
-* purposes as that state is handled explicitly in cpu_down.
-*/
-   return state > CPUHP_BRINGUP_CPU && state != CPUHP_TEARDOWN_CPU;
-}
-
 static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)
 {
return cpuhp_hp_states + state;
@@ -247,6 +238,15 @@ static inline void complete_ap_thread(struct 
cpuhp_cpu_state *st, bool bringup)
complete(done);
 }
 
+static bool cpuhp_is_ap_state(enum cpuhp_state state)
+{
+   /*
+* The extra check for CPUHP_TEARDOWN_CPU is only for documentation
+* purposes as that state is handled explicitly in cpu_down.
+*/
+   return state > CPUHP_BRINGUP_CPU && state != CPUHP_TEARDOWN_CPU;
+}
+
 /*
  * The former STARTING/DYING states, ran with IRQs disabled and must not fail.
  */
-- 
2.14.3





  1   2   3   4   5   6   >