Re: [PATCH] eata: Convert eata driver as normal PCI and platform device drivers

2016-03-01 Thread Jiang Liu
On 2016/3/2 5:36, Arthur Marsh wrote:
> 
> 
> Arthur Marsh wrote on 02/03/16 03:57:
>>
>>
>> Christoph Hellwig wrote on 01/03/16 17:22:
>>> Hi Jiang.
>>>
>>> I'd love to see this patch in and abuse of the old PCI API gone.
>>>
>>> Did you resolve the problems Arthur saw with the previous iteratons
>>> of the patch?
>>>
>>
>> I applied Jiang Liu's patch of 1st March 2016 to a clean kernel
>> 4.5.0-rc6 source, removed my workaround of removing and re-adding the
>> eata module before mounting file-systems that are on disks attached to
>> the DPT SCSI card using the eata driver, and was able to kexec from the
>> new kernel successfully.
>>
>> Arthur.
> 
> I spoke too soon, without removing and re-inserting the eata module
> before any filesystems on disks attached to the DPT controller were
> mounted, I'd get the following messages, similar to ones previously
> reported:
> 
> sd 0:0:6:0: tag#0 abort, mbox 1.
> EATA0: abort, mbox 1 is in use.
> sd 0:0:6:0: tag#0 reset, enter.
> EATA0: reset, mbox 1 in reset.
> EATA0: reset, board reset done, enabling interrupts.
> EATA0: reset, interrupts disabled, loops 100415.
> EATA0, reset, mbox 1 locked, DID_RESET, done.
> EATA0: reset, exit, done.
> 
> 
> and so on, finally hanging after printing "kexec_core: Starting new
> kernel" (I have a photo of the messages if they're needed).
> 
> So I'm still using the new patch but have to continue to remove and
> reinsert eata at start-up before any attempts to mount disks attatched
> to the DPT SCSI controller.
Hi Authur,
Thanks for testing. So current situation is that we have
a working driver for normal case, but still have issues during kexec.
Per my understanding, we need to implement a PCI device driver shutdown
callback to reset the RAID controller. I have once tried to implement
the shutdown callback, but it doesn't work. And I have no deep
understanding of the RAID controller and have no hardware for
experiment too, so have no idea about next step.
Maybe one acceptable way is to merge this patch first, so
we get a basic working driver, and then ask help from expert to
solve the kexec issue.
Thanks!
Gerry

> 
> Arthur.


Re: [PATCH] eata: Convert eata driver as normal PCI and platform device drivers

2016-03-01 Thread Jiang Liu
On 2016/3/2 5:36, Arthur Marsh wrote:
> 
> 
> Arthur Marsh wrote on 02/03/16 03:57:
>>
>>
>> Christoph Hellwig wrote on 01/03/16 17:22:
>>> Hi Jiang.
>>>
>>> I'd love to see this patch in and abuse of the old PCI API gone.
>>>
>>> Did you resolve the problems Arthur saw with the previous iteratons
>>> of the patch?
>>>
>>
>> I applied Jiang Liu's patch of 1st March 2016 to a clean kernel
>> 4.5.0-rc6 source, removed my workaround of removing and re-adding the
>> eata module before mounting file-systems that are on disks attached to
>> the DPT SCSI card using the eata driver, and was able to kexec from the
>> new kernel successfully.
>>
>> Arthur.
> 
> I spoke too soon, without removing and re-inserting the eata module
> before any filesystems on disks attached to the DPT controller were
> mounted, I'd get the following messages, similar to ones previously
> reported:
> 
> sd 0:0:6:0: tag#0 abort, mbox 1.
> EATA0: abort, mbox 1 is in use.
> sd 0:0:6:0: tag#0 reset, enter.
> EATA0: reset, mbox 1 in reset.
> EATA0: reset, board reset done, enabling interrupts.
> EATA0: reset, interrupts disabled, loops 100415.
> EATA0, reset, mbox 1 locked, DID_RESET, done.
> EATA0: reset, exit, done.
> 
> 
> and so on, finally hanging after printing "kexec_core: Starting new
> kernel" (I have a photo of the messages if they're needed).
> 
> So I'm still using the new patch but have to continue to remove and
> reinsert eata at start-up before any attempts to mount disks attatched
> to the DPT SCSI controller.
Hi Authur,
Thanks for testing. So current situation is that we have
a working driver for normal case, but still have issues during kexec.
Per my understanding, we need to implement a PCI device driver shutdown
callback to reset the RAID controller. I have once tried to implement
the shutdown callback, but it doesn't work. And I have no deep
understanding of the RAID controller and have no hardware for
experiment too, so have no idea about next step.
Maybe one acceptable way is to merge this patch first, so
we get a basic working driver, and then ask help from expert to
solve the kexec issue.
Thanks!
Gerry

> 
> Arthur.


[PATCH] eata: Convert eata driver as normal PCI and platform device drivers

2016-02-29 Thread Jiang Liu
Previously the eata driver just grabs and accesses eata PCI devices
without implementing a PCI device driver, that causes troubles with
latest IRQ related

Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") changes the way to allocate PCI legacy IRQ
for PCI devices on x86 platforms. Instead of allocating PCI legacy
IRQs when pcibios_enable_device() gets called, now pcibios_alloc_irq()
will be called by pci_device_probe() to allocate PCI legacy IRQs
when binding PCI drivers to PCI devices.

But the eata driver directly accesses PCI devices without implementing
corresponding PCI drivers, so pcibios_alloc_irq() won't be called for
those PCI devices and wrong IRQ number may be used to manage the PCI
device.

This patch implements a PCI device driver to manage eata PCI devices,
so eata driver could properly cooperate with the PCI core. It also
provides headroom for PCI hotplug with eata driver.

It also represents non-PCI eata devices as platform devices, so it could
be managed as normal devices.

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Cc: Hannes Reinecke <h...@suse.de>
Cc: Ballabio, Dario <dario.balla...@emc.com>
Cc: Christoph Hellwig <h...@infradead.org>
---
 drivers/scsi/eata.c |  624 ---
 1 file changed, 342 insertions(+), 282 deletions(-)

diff --git a/drivers/scsi/eata.c b/drivers/scsi/eata.c
index 227dd2c2ec2f..a27a7201866d 100644
--- a/drivers/scsi/eata.c
+++ b/drivers/scsi/eata.c
@@ -486,6 +486,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -503,8 +504,6 @@
 #include 
 #include 
 
-static int eata2x_detect(struct scsi_host_template *);
-static int eata2x_release(struct Scsi_Host *);
 static int eata2x_queuecommand(struct Scsi_Host *, struct scsi_cmnd *);
 static int eata2x_eh_abort(struct scsi_cmnd *);
 static int eata2x_eh_host_reset(struct scsi_cmnd *);
@@ -513,9 +512,9 @@ static int eata2x_bios_param(struct scsi_device *, struct 
block_device *,
 static int eata2x_slave_configure(struct scsi_device *);
 
 static struct scsi_host_template driver_template = {
+   .module = THIS_MODULE,
+   .proc_name = "eata2x",
.name = "EATA/DMA 2.0x rev. 8.10.00 ",
-   .detect = eata2x_detect,
-   .release = eata2x_release,
.queuecommand = eata2x_queuecommand,
.eh_abort_handler = eata2x_eh_abort,
.eh_host_reset_handler = eata2x_eh_host_reset,
@@ -818,7 +817,6 @@ struct hostdata {
unsigned int cp_stat[MAX_MAILBOXES];/* FREE, IN_USE, LOCKED, 
IN_RESET */
unsigned int last_cp_used;  /* Index of last mailbox used */
unsigned int iocount;   /* Total i/o done for this board */
-   int board_number;   /* Number of this board */
char board_name[16];/* Name of this board */
int in_reset;   /* True if board is doing a reset */
int target_to[MAX_TARGET][MAX_CHANNEL]; /* N. of timeout errors on 
target */
@@ -834,12 +832,9 @@ struct hostdata {
struct mssp sp; /* Local copy of sp buffer */
 };
 
-static struct Scsi_Host *sh[MAX_BOARDS];
 static const char *driver_name = "EATA";
-static char sha[MAX_BOARDS];
-
-/* Initialize num_boards so that ihdlr can work while detect is in progress */
-static unsigned int num_boards = MAX_BOARDS;
+static struct platform_device *eata2x_platform_devs[MAX_BOARDS];
+static bool eata2x_platform_driver_registered;
 
 static unsigned long io_port[] = {
 
@@ -850,10 +845,6 @@ static unsigned long io_port[] = {
/* First ISA */
0x1f0,
 
-   /* Space for MAX_PCI ports possibly reported by PCI_BIOS */
-   SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP,
-   SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP,
-
/* MAX_EISA ports */
0x1c88, 0x2c88, 0x3c88, 0x4c88, 0x5c88, 0x6c88, 0x7c88, 0x8c88,
0x9c88, 0xac88, 0xbc88, 0xcc88, 0xdc88, 0xec88, 0xfc88,
@@ -871,6 +862,18 @@ static unsigned long io_port[] = {
 #define H2DEV16(x) cpu_to_be16(x)
 #define DEV2H16(x) be16_to_cpu(x)
 
+#define dev_warn_on(dev, cond, fmt, ...)   \
+do {   \
+   if (cond)   \
+   dev_warn(dev, fmt, ##__VA_ARGS__);  \
+} while(0)
+
+#define dev_info_on(dev, cond, fmt, ...)   \
+do {   \
+   if (cond)   \
+   dev_info(dev, fmt, ##__VA_ARGS__);  \
+} while(0)
+
 /* But transfer orientation from the 16 bit data register is Little Endian */
 #define REG2H(x)   le16_to_cpu(x)
 
@@ -1024,90 +1027,43 @@ static int read_pio(unsigned long iobase, ushort * 
start, ushort * end)
return 0;

[PATCH] eata: Convert eata driver as normal PCI and platform device drivers

2016-02-29 Thread Jiang Liu
Previously the eata driver just grabs and accesses eata PCI devices
without implementing a PCI device driver, that causes troubles with
latest IRQ related

Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") changes the way to allocate PCI legacy IRQ
for PCI devices on x86 platforms. Instead of allocating PCI legacy
IRQs when pcibios_enable_device() gets called, now pcibios_alloc_irq()
will be called by pci_device_probe() to allocate PCI legacy IRQs
when binding PCI drivers to PCI devices.

But the eata driver directly accesses PCI devices without implementing
corresponding PCI drivers, so pcibios_alloc_irq() won't be called for
those PCI devices and wrong IRQ number may be used to manage the PCI
device.

This patch implements a PCI device driver to manage eata PCI devices,
so eata driver could properly cooperate with the PCI core. It also
provides headroom for PCI hotplug with eata driver.

It also represents non-PCI eata devices as platform devices, so it could
be managed as normal devices.

Signed-off-by: Jiang Liu 
Cc: Hannes Reinecke 
Cc: Ballabio, Dario 
Cc: Christoph Hellwig 
---
 drivers/scsi/eata.c |  624 ---
 1 file changed, 342 insertions(+), 282 deletions(-)

diff --git a/drivers/scsi/eata.c b/drivers/scsi/eata.c
index 227dd2c2ec2f..a27a7201866d 100644
--- a/drivers/scsi/eata.c
+++ b/drivers/scsi/eata.c
@@ -486,6 +486,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -503,8 +504,6 @@
 #include 
 #include 
 
-static int eata2x_detect(struct scsi_host_template *);
-static int eata2x_release(struct Scsi_Host *);
 static int eata2x_queuecommand(struct Scsi_Host *, struct scsi_cmnd *);
 static int eata2x_eh_abort(struct scsi_cmnd *);
 static int eata2x_eh_host_reset(struct scsi_cmnd *);
@@ -513,9 +512,9 @@ static int eata2x_bios_param(struct scsi_device *, struct 
block_device *,
 static int eata2x_slave_configure(struct scsi_device *);
 
 static struct scsi_host_template driver_template = {
+   .module = THIS_MODULE,
+   .proc_name = "eata2x",
.name = "EATA/DMA 2.0x rev. 8.10.00 ",
-   .detect = eata2x_detect,
-   .release = eata2x_release,
.queuecommand = eata2x_queuecommand,
.eh_abort_handler = eata2x_eh_abort,
.eh_host_reset_handler = eata2x_eh_host_reset,
@@ -818,7 +817,6 @@ struct hostdata {
unsigned int cp_stat[MAX_MAILBOXES];/* FREE, IN_USE, LOCKED, 
IN_RESET */
unsigned int last_cp_used;  /* Index of last mailbox used */
unsigned int iocount;   /* Total i/o done for this board */
-   int board_number;   /* Number of this board */
char board_name[16];/* Name of this board */
int in_reset;   /* True if board is doing a reset */
int target_to[MAX_TARGET][MAX_CHANNEL]; /* N. of timeout errors on 
target */
@@ -834,12 +832,9 @@ struct hostdata {
struct mssp sp; /* Local copy of sp buffer */
 };
 
-static struct Scsi_Host *sh[MAX_BOARDS];
 static const char *driver_name = "EATA";
-static char sha[MAX_BOARDS];
-
-/* Initialize num_boards so that ihdlr can work while detect is in progress */
-static unsigned int num_boards = MAX_BOARDS;
+static struct platform_device *eata2x_platform_devs[MAX_BOARDS];
+static bool eata2x_platform_driver_registered;
 
 static unsigned long io_port[] = {
 
@@ -850,10 +845,6 @@ static unsigned long io_port[] = {
/* First ISA */
0x1f0,
 
-   /* Space for MAX_PCI ports possibly reported by PCI_BIOS */
-   SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP,
-   SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP, SKIP,
-
/* MAX_EISA ports */
0x1c88, 0x2c88, 0x3c88, 0x4c88, 0x5c88, 0x6c88, 0x7c88, 0x8c88,
0x9c88, 0xac88, 0xbc88, 0xcc88, 0xdc88, 0xec88, 0xfc88,
@@ -871,6 +862,18 @@ static unsigned long io_port[] = {
 #define H2DEV16(x) cpu_to_be16(x)
 #define DEV2H16(x) be16_to_cpu(x)
 
+#define dev_warn_on(dev, cond, fmt, ...)   \
+do {   \
+   if (cond)   \
+   dev_warn(dev, fmt, ##__VA_ARGS__);  \
+} while(0)
+
+#define dev_info_on(dev, cond, fmt, ...)   \
+do {   \
+   if (cond)   \
+   dev_info(dev, fmt, ##__VA_ARGS__);  \
+} while(0)
+
 /* But transfer orientation from the 16 bit data register is Little Endian */
 #define REG2H(x)   le16_to_cpu(x)
 
@@ -1024,90 +1027,43 @@ static int read_pio(unsigned long iobase, ushort * 
start, ushort * end)
return 0;
 }
 
-static struct pci_dev *get_pci_dev(unsigned long port_base)
-{
-#if defined(CONFIG_PCI)
-   unsigned int 

Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-12-28 Thread Jiang Liu
On 2015/12/24 13:15, Jeremiah Mahler wrote:
> Jiang,
> 
> On Wed, Dec 23, 2015 at 10:13:26PM +0800, Jiang Liu wrote:
>> Function __assign_irq_vector() makes use of apic_chip_data.old_domain
>> as a temporary buffer, which causes trouble to rollback logic in case of
>> failure. So use a dedicated temporary buffer for __assign_irq_vector().
>>
>> Signed-off-by: Jiang Liu 
>> ---
>>  arch/x86/kernel/apic/vector.c |9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
> [...]
> 
> I tried this patch and the rest in the series but unfortunately
> the bug is still present.
> 
>   [   10.184649] wlan0: authenticated
>   [   10.187883] wlan0: associate with 02:1a:11:fb:90:1c (try 1/3)
>   [   10.191574] do_IRQ: 0.35 No irq handler for vector
>   [   10.191589] do_IRQ: 0.35 No irq handler for vector
>   [   10.198159] do_IRQ: 0.35 No irq handler for vector
>   [   10.198165] do_IRQ: 0.35 No irq handler for vector
>   [   10.200534] wlan0: RX AssocResp from 02:1a:11:fb:90:1c (capab=0x431
>   status=0 aid=1)
>   [   10.204611] wlan0: associated
>   [   10.238883] do_IRQ: 0.35 No irq handler for vector
>   [   10.238892] do_IRQ: 0.35 No irq handler for vector
>   [   10.280716] do_IRQ: 0.35 No irq handler for vector
>   [   10.281083] do_IRQ: 0.35 No irq handler for vector
>   [   10.286484] do_IRQ: 0.35 No irq handler for vector
>   ...
> 
Hi Jeremiah,
Could you please help to confirm which commit caused the
regression?
1) x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary
buffer
2) x86/irq: Fix a race condition between vector assigning and cleanup

Thanks,
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-12-28 Thread Jiang Liu
On 2015/12/24 13:15, Jeremiah Mahler wrote:
> Jiang,
> 
> On Wed, Dec 23, 2015 at 10:13:26PM +0800, Jiang Liu wrote:
>> Function __assign_irq_vector() makes use of apic_chip_data.old_domain
>> as a temporary buffer, which causes trouble to rollback logic in case of
>> failure. So use a dedicated temporary buffer for __assign_irq_vector().
>>
>> Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>> ---
>>  arch/x86/kernel/apic/vector.c |9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
> [...]
> 
> I tried this patch and the rest in the series but unfortunately
> the bug is still present.
> 
>   [   10.184649] wlan0: authenticated
>   [   10.187883] wlan0: associate with 02:1a:11:fb:90:1c (try 1/3)
>   [   10.191574] do_IRQ: 0.35 No irq handler for vector
>   [   10.191589] do_IRQ: 0.35 No irq handler for vector
>   [   10.198159] do_IRQ: 0.35 No irq handler for vector
>   [   10.198165] do_IRQ: 0.35 No irq handler for vector
>   [   10.200534] wlan0: RX AssocResp from 02:1a:11:fb:90:1c (capab=0x431
>   status=0 aid=1)
>   [   10.204611] wlan0: associated
>   [   10.238883] do_IRQ: 0.35 No irq handler for vector
>   [   10.238892] do_IRQ: 0.35 No irq handler for vector
>   [   10.280716] do_IRQ: 0.35 No irq handler for vector
>   [   10.281083] do_IRQ: 0.35 No irq handler for vector
>   [   10.286484] do_IRQ: 0.35 No irq handler for vector
>   ...
> 
Hi Jeremiah,
Could you please help to confirm which commit caused the
regression?
1) x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary
buffer
2) x86/irq: Fix a race condition between vector assigning and cleanup

Thanks,
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

2015-12-23 Thread Jiang Liu
Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b4461e..b32c6ef7b4b0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+   unsigned int dest;
 
if (d->move_in_progress)
return -EBUSY;
@@ -132,19 +133,21 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
 
if (cpumask_subset(vector_cpumask, d->domain)) {
-   err = 0;
-   if (cpumask_equal(vector_cpumask, d->domain))
-   break;
/*
 * New cpumask using the vector is a proper subset of
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
+   cpumask_and(used_cpumask, d->domain, vector_cpumask);
+   err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+  );
+   if (err || cpumask_equal(vector_cpumask, d->domain))
+   break;
cpumask_andnot(d->old_domain, d->domain,
   vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
break;
}
 
@@ -167,11 +170,13 @@ next:
 
if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
}
+   if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, ))
+   goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +195,7 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   err = apic->cpu_mask_to_apicid_and(mask, d->domain,
-  >cfg.dest_apicid);
+   d->cfg.dest_apicid = dest;
}
 
return err;
@@ -493,14 +497,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
 
err = assign_irq_vector(irq, data, dest);
-   if (err) {
-   if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
 
-   return IRQ_SET_MASK_OK;
+   return err ? err : IRQ_SET_MASK_OK;
 }
 
 static struct irq_chip lapic_controller = {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

2015-12-23 Thread Jiang Liu
Joe Lawrence  reported an use after release
issue related to x86 IRQ management code. Please refer to following
link for more information:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1026840.html

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   76 ++---
 1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f648fce39d5e..ab54b296a7d0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -119,7 +119,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
int cpu, err;
unsigned int dest;
 
-   if (d->move_in_progress)
+   if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
 
/* Only try and allocate irqs on cpus that are present */
@@ -141,13 +141,14 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
   );
-   if (err || cpumask_equal(vector_cpumask, d->domain))
+   if (err)
break;
-   cpumask_andnot(d->old_domain, d->domain,
-  vector_cpumask);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_copy(d->domain, used_cpumask);
+   d->cfg.dest_apicid = dest;
+   if (!cpumask_equal(vector_cpumask, d->domain)) {
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
+   }
break;
}
 
@@ -180,22 +181,20 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
-   if (d->cfg.vector) {
+   if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   }
+   d->cfg.vector = vector;
+   d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
-   d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
}
 
if (!err) {
-   /* cache destination APIC IDs into cfg->dest_apicid */
-   d->cfg.dest_apicid = dest;
+   cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+   d->move_in_progress = !cpumask_empty(d->old_domain);
}
 
return err;
@@ -227,7 +226,7 @@ static int assign_irq_vector_policy(int irq, int node,
 
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
-   struct irq_desc *desc;
+   struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -236,10 +235,6 @@ static void clear_irq_vector(int irq, struct 
apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress))
-   return;
-
-   desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
 vector++) {
@@ -421,10 +416,13 @@ static void __setup_vector_irq(int cpu)
struct irq_data *idata = irq_desc_get_irq_data(desc);
 
data = apic_chip_data(idata);
-   if (!data || !cpumask_test_cpu(cpu, data->domain))
-   continue;
-   vector = data->cfg.vector;
-   per_cpu(vector_irq, cpu)[vector] = desc;
+   if (data) {
+   cpumask_clear_cpu(cpu, data->old_domain);
+   

[Bugfix v2 3/5] x86/irq: Fix a race window in x86_vector_free_irqs()

2015-12-23 Thread Jiang Liu
There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
x   //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->   // may access freed memory
raw_spin_unlock(>lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b32c6ef7b4b0..f648fce39d5e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -228,23 +228,16 @@ static int assign_irq_vector_policy(int irq, int node,
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
struct irq_desc *desc;
-   unsigned long flags;
-   int cpu, vector;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   BUG_ON(!data->cfg.vector);
+   int cpu, vector = data->cfg.vector;
 
-   vector = data->cfg.vector;
+   BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress)) {
-   raw_spin_unlock_irqrestore(_lock, flags);
+   if (likely(!data->move_in_progress))
return;
-   }
 
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -257,7 +250,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data 
*data)
}
}
data->move_in_progress = 0;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   cpumask_clear(data->old_domain);
 }
 
 void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -279,18 +272,21 @@ static void x86_vector_free_irqs(struct irq_domain 
*domain,
 unsigned int virq, unsigned int nr_irqs)
 {
struct irq_data *irq_data;
+   unsigned long flags;
int i;
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+   raw_spin_lock_irqsave(_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+   irq_domain_reset_irq_data(irq_data);
+   raw_spin_unlock_irqrestore(_lock, flags);
 #ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
 #endif
-   irq_domain_reset_irq_data(irq_data);
}
}
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-12-23 Thread Jiang Liu
Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59c8f25..d6ec36b4461e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+   cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain,
-  vector_cpumask);
-   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
 
BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
+   BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
 
return arch_early_ioapic_init();
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 5/5] x86/irq: Trivial cleanups for x86 vector allocation code

2015-12-23 Thread Jiang Liu
Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   54 ++---
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index ab54b296a7d0..008114d0d2bd 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@
 
 struct apic_chip_data {
struct irq_cfg  cfg;
+   u8  move_in_progress : 1;
cpumask_var_t   domain;
cpumask_var_t   old_domain;
-   u8  move_in_progress : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 
 void lock_vector_lock(void)
 {
-   /* Used to the online set of cpus does not change
+   /* Used to ensure that the online set of cpus does not change
 * during assign_irq_vector.
 */
raw_spin_lock(_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
 }
 
-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
-  const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 */
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
-   int cpu, err;
+   int cpu, err = -EBUSY;
+   struct irq_desc *desc = irq_data_to_desc(data);
+   struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+   unsigned long flags;
 
+   raw_spin_lock_irqsave(_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
-   return -EBUSY;
+   goto out;
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -186,7 +189,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
-   per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+   per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -196,37 +199,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
-   return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
-const struct cpumask *mask)
-{
-   int err;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(_lock, flags);
+
return err;
 }
 
-static int assign_irq_vector_policy(int irq, int node,
-   struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
 {
if (info && info->mask)
-   return assign_irq_vector(irq, data, info->mask);
+   return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
-   assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+   assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
-   return assign_irq_vector(irq, data, apic->target_cpus());
+   return assign_irq_vector(data, apic->target_cpus());
 }
 
-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
 {
-   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_desc *desc = irq_data_to_desc(irq_data);
+   struct apic_chip_data *data = irq_data->chip_data;
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -274,7 +267,7 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
raw_spin_lock_irqsave(_lock, flags);
-   clear_irq_vector(virq + i, irq_data->chip_data);
+   clear_irq_vector(irq_data);
free_apic_chip_data(irq_data->chip_data);
irq_domain_reset_irq_data(irq_d

[Bugfix v2 3/5] x86/irq: Fix a race window in x86_vector_free_irqs()

2015-12-23 Thread Jiang Liu
There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
x   //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->   // may access freed memory
raw_spin_unlock(>lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b32c6ef7b4b0..f648fce39d5e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -228,23 +228,16 @@ static int assign_irq_vector_policy(int irq, int node,
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
struct irq_desc *desc;
-   unsigned long flags;
-   int cpu, vector;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   BUG_ON(!data->cfg.vector);
+   int cpu, vector = data->cfg.vector;
 
-   vector = data->cfg.vector;
+   BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress)) {
-   raw_spin_unlock_irqrestore(_lock, flags);
+   if (likely(!data->move_in_progress))
return;
-   }
 
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -257,7 +250,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data 
*data)
}
}
data->move_in_progress = 0;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   cpumask_clear(data->old_domain);
 }
 
 void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -279,18 +272,21 @@ static void x86_vector_free_irqs(struct irq_domain 
*domain,
 unsigned int virq, unsigned int nr_irqs)
 {
struct irq_data *irq_data;
+   unsigned long flags;
int i;
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+   raw_spin_lock_irqsave(_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+   irq_domain_reset_irq_data(irq_data);
+   raw_spin_unlock_irqrestore(_lock, flags);
 #ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
 #endif
-   irq_domain_reset_irq_data(irq_data);
}
}
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

2015-12-23 Thread Jiang Liu
Joe Lawrence <joe.lawre...@stratus.com> reported an use after release
issue related to x86 IRQ management code. Please refer to following
link for more information:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1026840.html

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   76 ++---
 1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f648fce39d5e..ab54b296a7d0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -119,7 +119,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
int cpu, err;
unsigned int dest;
 
-   if (d->move_in_progress)
+   if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
 
/* Only try and allocate irqs on cpus that are present */
@@ -141,13 +141,14 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
   );
-   if (err || cpumask_equal(vector_cpumask, d->domain))
+   if (err)
break;
-   cpumask_andnot(d->old_domain, d->domain,
-  vector_cpumask);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_copy(d->domain, used_cpumask);
+   d->cfg.dest_apicid = dest;
+   if (!cpumask_equal(vector_cpumask, d->domain)) {
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
+   }
break;
}
 
@@ -180,22 +181,20 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
-   if (d->cfg.vector) {
+   if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   }
+   d->cfg.vector = vector;
+   d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
-   d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
}
 
if (!err) {
-   /* cache destination APIC IDs into cfg->dest_apicid */
-   d->cfg.dest_apicid = dest;
+   cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+   d->move_in_progress = !cpumask_empty(d->old_domain);
}
 
return err;
@@ -227,7 +226,7 @@ static int assign_irq_vector_policy(int irq, int node,
 
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
-   struct irq_desc *desc;
+   struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -236,10 +235,6 @@ static void clear_irq_vector(int irq, struct 
apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress))
-   return;
-
-   desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
 vector++) {
@@ -421,10 +416,13 @@ static void __setup_vector_irq(int cpu)
struct irq_data *idata = irq_desc_get_irq_data(desc);
 
data = apic_chip_data(idata);
-   if (!data || !cpumask_test_cpu(cpu, data->domain))
-   continue;
-   vector = data->cfg.vector;
-   per_cpu(vector_irq, cpu)[vector] = desc;
+   if (data) {

[Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-12-23 Thread Jiang Liu
Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59c8f25..d6ec36b4461e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+   cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain,
-  vector_cpumask);
-   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
 
BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
+   BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
 
return arch_early_ioapic_init();
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

2015-12-23 Thread Jiang Liu
Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b4461e..b32c6ef7b4b0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+   unsigned int dest;
 
if (d->move_in_progress)
return -EBUSY;
@@ -132,19 +133,21 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
apic->vector_allocation_domain(cpu, vector_cpumask, mask);
 
if (cpumask_subset(vector_cpumask, d->domain)) {
-   err = 0;
-   if (cpumask_equal(vector_cpumask, d->domain))
-   break;
/*
 * New cpumask using the vector is a proper subset of
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
+   cpumask_and(used_cpumask, d->domain, vector_cpumask);
+   err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+  );
+   if (err || cpumask_equal(vector_cpumask, d->domain))
+   break;
cpumask_andnot(d->old_domain, d->domain,
   vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
break;
}
 
@@ -167,11 +170,13 @@ next:
 
if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
}
+   if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, ))
+   goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +195,7 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   err = apic->cpu_mask_to_apicid_and(mask, d->domain,
-  >cfg.dest_apicid);
+   d->cfg.dest_apicid = dest;
}
 
return err;
@@ -493,14 +497,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
 
err = assign_irq_vector(irq, data, dest);
-   if (err) {
-   if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
 
-   return IRQ_SET_MASK_OK;
+   return err ? err : IRQ_SET_MASK_OK;
 }
 
 static struct irq_chip lapic_controller = {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix v2 5/5] x86/irq: Trivial cleanups for x86 vector allocation code

2015-12-23 Thread Jiang Liu
Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   54 ++---
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index ab54b296a7d0..008114d0d2bd 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@
 
 struct apic_chip_data {
struct irq_cfg  cfg;
+   u8  move_in_progress : 1;
cpumask_var_t   domain;
cpumask_var_t   old_domain;
-   u8  move_in_progress : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 
 void lock_vector_lock(void)
 {
-   /* Used to the online set of cpus does not change
+   /* Used to ensure that the online set of cpus does not change
 * during assign_irq_vector.
 */
raw_spin_lock(_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
 }
 
-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
-  const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 */
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
-   int cpu, err;
+   int cpu, err = -EBUSY;
+   struct irq_desc *desc = irq_data_to_desc(data);
+   struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+   unsigned long flags;
 
+   raw_spin_lock_irqsave(_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
-   return -EBUSY;
+   goto out;
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -186,7 +189,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
-   per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+   per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -196,37 +199,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
-   return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
-const struct cpumask *mask)
-{
-   int err;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(_lock, flags);
+
return err;
 }
 
-static int assign_irq_vector_policy(int irq, int node,
-   struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
 {
if (info && info->mask)
-   return assign_irq_vector(irq, data, info->mask);
+   return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
-   assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+   assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
-   return assign_irq_vector(irq, data, apic->target_cpus());
+   return assign_irq_vector(data, apic->target_cpus());
 }
 
-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
 {
-   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_desc *desc = irq_data_to_desc(irq_data);
+   struct apic_chip_data *data = irq_data->chip_data;
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -274,7 +267,7 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
raw_spin_lock_irqsave(_lock, flags);
-   clear_irq_vector(virq + i, irq_data->chip_data);
+   clear_irq_vector(irq_data);
free_apic_chip_d

Re: Problems with x86/x86_64 qemu tests in linux-next due to 'Enhance __assign_irq_vector() to rollback ...'

2015-12-18 Thread Jiang Liu
On 2015/12/18 7:59, Guenter Roeck wrote:
> Hi folks,
> 
> several of my qemu tests of linux-next started failing a couple of days
> ago.
> Logs are available at http://server.roeck-us.net:8010/builders, in the
> 'next' column.
> 
> Bisect points to 'x86/irq: Enhance __assign_irq_vector() to rollback in
> case of failure'.
> Bisect log is attached below. Reverting this patch as well as the
> subsequent patches
> in arch/x86/kernel/apic/vector.c fixes the problem.
Hi Guenter,
Thanks for reporting this issue. We have already had a fix
for it, and will send the new patch out once it passes function tests.
Thanks,
Gerry

> 
> Guenter
> 
> ---
> # bad: [57036847fee7aa782ea834af770426517f1efc96] Add linux-next
> specific files for 20151216
> # good: [9f9499ae8e6415cefc4fe0a96ad0e27864353c89] Linux 4.4-rc5
> git bisect start 'HEAD' 'v4.4-rc5'
> # good: [cd61a37515d48ce6603e681f18f0ae59025c733a] Merge remote-tracking
> branch 'crypto/master'
> git bisect good cd61a37515d48ce6603e681f18f0ae59025c733a
> # bad: [e12cc3355bf7fc7fdca05d12ae1aa53a2561af84] Merge remote-tracking
> branch 'irqchip/irqchip/for-next'
> git bisect bad e12cc3355bf7fc7fdca05d12ae1aa53a2561af84
> # good: [2e27153f14236cc20bf1cad16da202b5a8fb6189] Merge remote-tracking
> branch 'sound-asoc/for-next'
> git bisect good 2e27153f14236cc20bf1cad16da202b5a8fb6189
> # good: [95ecee8d876ddabcbe32fc48ec8c7272d1738645] Merge remote-tracking
> branch 'mailbox/mailbox-for-next'
> git bisect good 95ecee8d876ddabcbe32fc48ec8c7272d1738645
> # good: [05a927fb715bd97ce3baed157b157c16a3216f55] Merge branch
> 'timers/core'
> git bisect good 05a927fb715bd97ce3baed157b157c16a3216f55
> # bad: [acae1b21abdc2b1a0a8809d4263b7b06c380c065] Merge remote-tracking
> branch 'tip/auto-latest'
> git bisect bad acae1b21abdc2b1a0a8809d4263b7b06c380c065
> # bad: [f45b7ee9ed0632acc1c7404a1e8f47a40146a07e] Merge branch 'x86/fpu'
> git bisect bad f45b7ee9ed0632acc1c7404a1e8f47a40146a07e
> # bad: [41c19d8bf8540c9016563b649b2034d1804ca0af] Merge branch 'x86/asm'
> git bisect bad 41c19d8bf8540c9016563b649b2034d1804ca0af
> # bad: [ba207f77e6e5642ad7c3dc7f2217c590cf1352cb] Merge branch
> 'x86/urgent' into x86/apic
> git bisect bad ba207f77e6e5642ad7c3dc7f2217c590cf1352cb
> # good: [c61a0d31ba0ce75cb1b88bb4eb2f41a1b80bc90f] x86/apic: Wire up
> single IPI for apic_numachip
> git bisect good c61a0d31ba0ce75cb1b88bb4eb2f41a1b80bc90f
> # good: [2fde46b79e2fdbc90d0d97cf992782732b5a371c] x86/smpboot:
> Re-enable init_udelay=0 by default on modern CPUs
> git bisect good 2fde46b79e2fdbc90d0d97cf992782732b5a371c
> # bad: [21a1b3bf35018b446c943c15f0a6225e6f6497ae] x86/irq: Fix a race
> window in x86_vector_free_irqs()
> git bisect bad 21a1b3bf35018b446c943c15f0a6225e6f6497ae
> # bad: [4c24cee6b2aeaee3dab896f76fef4fe79d9e4183] x86/irq: Enhance
> __assign_irq_vector() to rollback in case of failure
> git bisect bad 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183
> # good: [6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20] x86/irq: Do not reuse
> struct apic_chip_data.old_domain as temporary buffer
> git bisect good 6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20
> # first bad commit: [4c24cee6b2aeaee3dab896f76fef4fe79d9e4183] x86/irq:
> Enhance __assign_irq_vector() to rollback in case of failure
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with x86/x86_64 qemu tests in linux-next due to 'Enhance __assign_irq_vector() to rollback ...'

2015-12-18 Thread Jiang Liu
On 2015/12/18 7:59, Guenter Roeck wrote:
> Hi folks,
> 
> several of my qemu tests of linux-next started failing a couple of days
> ago.
> Logs are available at http://server.roeck-us.net:8010/builders, in the
> 'next' column.
> 
> Bisect points to 'x86/irq: Enhance __assign_irq_vector() to rollback in
> case of failure'.
> Bisect log is attached below. Reverting this patch as well as the
> subsequent patches
> in arch/x86/kernel/apic/vector.c fixes the problem.
Hi Guenter,
Thanks for reporting this issue. We have already had a fix
for it, and will send the new patch out once it passes function tests.
Thanks,
Gerry

> 
> Guenter
> 
> ---
> # bad: [57036847fee7aa782ea834af770426517f1efc96] Add linux-next
> specific files for 20151216
> # good: [9f9499ae8e6415cefc4fe0a96ad0e27864353c89] Linux 4.4-rc5
> git bisect start 'HEAD' 'v4.4-rc5'
> # good: [cd61a37515d48ce6603e681f18f0ae59025c733a] Merge remote-tracking
> branch 'crypto/master'
> git bisect good cd61a37515d48ce6603e681f18f0ae59025c733a
> # bad: [e12cc3355bf7fc7fdca05d12ae1aa53a2561af84] Merge remote-tracking
> branch 'irqchip/irqchip/for-next'
> git bisect bad e12cc3355bf7fc7fdca05d12ae1aa53a2561af84
> # good: [2e27153f14236cc20bf1cad16da202b5a8fb6189] Merge remote-tracking
> branch 'sound-asoc/for-next'
> git bisect good 2e27153f14236cc20bf1cad16da202b5a8fb6189
> # good: [95ecee8d876ddabcbe32fc48ec8c7272d1738645] Merge remote-tracking
> branch 'mailbox/mailbox-for-next'
> git bisect good 95ecee8d876ddabcbe32fc48ec8c7272d1738645
> # good: [05a927fb715bd97ce3baed157b157c16a3216f55] Merge branch
> 'timers/core'
> git bisect good 05a927fb715bd97ce3baed157b157c16a3216f55
> # bad: [acae1b21abdc2b1a0a8809d4263b7b06c380c065] Merge remote-tracking
> branch 'tip/auto-latest'
> git bisect bad acae1b21abdc2b1a0a8809d4263b7b06c380c065
> # bad: [f45b7ee9ed0632acc1c7404a1e8f47a40146a07e] Merge branch 'x86/fpu'
> git bisect bad f45b7ee9ed0632acc1c7404a1e8f47a40146a07e
> # bad: [41c19d8bf8540c9016563b649b2034d1804ca0af] Merge branch 'x86/asm'
> git bisect bad 41c19d8bf8540c9016563b649b2034d1804ca0af
> # bad: [ba207f77e6e5642ad7c3dc7f2217c590cf1352cb] Merge branch
> 'x86/urgent' into x86/apic
> git bisect bad ba207f77e6e5642ad7c3dc7f2217c590cf1352cb
> # good: [c61a0d31ba0ce75cb1b88bb4eb2f41a1b80bc90f] x86/apic: Wire up
> single IPI for apic_numachip
> git bisect good c61a0d31ba0ce75cb1b88bb4eb2f41a1b80bc90f
> # good: [2fde46b79e2fdbc90d0d97cf992782732b5a371c] x86/smpboot:
> Re-enable init_udelay=0 by default on modern CPUs
> git bisect good 2fde46b79e2fdbc90d0d97cf992782732b5a371c
> # bad: [21a1b3bf35018b446c943c15f0a6225e6f6497ae] x86/irq: Fix a race
> window in x86_vector_free_irqs()
> git bisect bad 21a1b3bf35018b446c943c15f0a6225e6f6497ae
> # bad: [4c24cee6b2aeaee3dab896f76fef4fe79d9e4183] x86/irq: Enhance
> __assign_irq_vector() to rollback in case of failure
> git bisect bad 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183
> # good: [6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20] x86/irq: Do not reuse
> struct apic_chip_data.old_domain as temporary buffer
> git bisect good 6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20
> # first bad commit: [4c24cee6b2aeaee3dab896f76fef4fe79d9e4183] x86/irq:
> Enhance __assign_irq_vector() to rollback in case of failure
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

2015-12-14 Thread Jiang Liu
On 2015/12/14 17:54, Borislav Petkov wrote:
> On Mon, Dec 14, 2015 at 02:54:02PM +0800, Huang, Ying wrote:
>> No, there are no other systems reporting the same issue. I will queue
>> more tests for make sure this is not a false positive.
> 
> I can trigger this too with my guest here.
> 
> I have these two ontop of rc5:
> 
> cc22b9b83f6a x86/irq: Enhance __assign_irq_vector() to rollback in case of 
> failure
> 45dd79e03e1e x86/irq: Do not reuse struct apic_chip_data.old_domain as 
> temporary buffer
> 9f9499ae8e64 Linux 4.4-rc5
> 
> and my guest stalls while booting.
> 
> The new thing I see in dmesg is this:
> 
>  ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> +..MP-BIOS bug: 8254 timer not connected to IO-APIC
> +...trying to set up timer (IRQ0) through the 8259A ...
> +. (found apic 0 pin 2) ...
> +... failed.
> +...trying to set up timer as Virtual Wire IRQ...
> +. failed.
> +...trying to set up timer as ExtINT IRQ...
> +. works.
> +APIC calibration not consistent with PM-Timer: 111ms instead of 100ms
> +APIC delta adjusted to PM-Timer: 6248393 (6997337)
> 
> which leads to boot stalling and timeoutting when loading the hdd
> driver:
Hi Boris and Ying,
Aha, found a possible regression. Could you please help to
apply the attached bugfix patch ontop of "cc22b9b83f6a x86/irq:
Enhance __assign_irq_vector() to rollback in case of failure"?
Hi Ying, I have push this patch to github so it should reach
0day test farm soon:)
Thanks,
Gerry

> 
> ...
> [3.973447] console [netcon0] enabled
> [3.976099] netconsole: network logging started
> [3.979604] rtc_cmos 00:00: setting system clock to 2015-12-14 10:45:35 
> UTC (1450089935)
> [3.985348] PM: Checking hibernation image partition /dev/sdb1
> [6.600706] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
> [6.613651] usb 1-1: New USB device strings: Mfr=1, Product=3, 
> SerialNumber=5
> [6.636905] usb 1-1: Product: QEMU USB Tablet
> [6.642248] usb 1-1: Manufacturer: QEMU
> [6.647109] usb 1-1: SerialNumber: 42
> [7.580995] ata2.00: qc timeout (cmd 0xa0)
> [7.589300] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [7.750715] ata2.01: NODEV after polling detection
> [7.759605] ata2.00: configured for MWDMA2
> [8.585691] input: QEMU QEMU USB Tablet as 
> /devices/pci:00/:00:01.2/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input1
> [8.602467] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 
> Pointer [QEMU QEMU USB Tablet] on usb-:00:01.2-1/input0
> [   12.760846] ata2.00: qc timeout (cmd 0xa0)
> [   12.786543] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [   12.796576] ata2.00: limiting speed to MWDMA2:PIO3
> [   12.958455] ata2.01: NODEV after polling detection
> [   12.969693] ata2.00: configured for MWDMA2
> [   17.972782] ata2.00: qc timeout (cmd 0xa0)
> [   17.978967] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [   17.983495] ata2.00: disabled
> [   17.986352] ata2: soft resetting link
> [   18.146586] ata2.01: NODEV after polling detection
> [   18.151413] ata2: EH complete
> [   32.745227] ata1: lost interrupt (Status 0x50)
> [   32.748470] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [   32.756586] ata1.00: failed command: READ DMA
> [   32.761251] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 
> 4096 in
> [   32.761251]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [   32.773928] ata1.00: status: { DRDY }
> [   32.777028] ata1: soft resetting link
> [   32.934437] ata1.01: NODEV after polling detection
> [   32.946663] ata1.00: configured for MWDMA2
> [   32.949964] ata1.00: device reported invalid CHS sector 0
> [   32.953793] ata1: EH complete
> [   63.849089] ata1: lost interrupt (Status 0x50)
> [   63.857470] ata1.00: limiting speed to MWDMA1:PIO4
> [   63.860982] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [   63.865862] ata1.00: failed command: READ DMA
> [   63.883697] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 
> 4096 in
> [   63.883697]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [   63.899573] ata1.00: status: { DRDY }
> [   63.902649] ata1: soft resetting link
> [   64.062580] ata1.01: NODEV after polling detection
> [   64.073800] ata1.00: configured for MWDMA1
> [   64.076813] ata1.00: device reported invalid CHS sector 0
> [   64.096188] ata1: EH complete
> 
>From c7c3cc3a048576fd1e196e67b11ae0193e7fba1e Mon Sep 17 00:00:00 2001
From: Jiang Liu 
Date: Tue, 15 Dec 2015 15:40:43 +0800
Subject: [PATCH]


Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   10 +++

Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

2015-12-14 Thread Jiang Liu
On 2015/12/14 17:54, Borislav Petkov wrote:
> On Mon, Dec 14, 2015 at 02:54:02PM +0800, Huang, Ying wrote:
>> No, there are no other systems reporting the same issue. I will queue
>> more tests for make sure this is not a false positive.
> 
> I can trigger this too with my guest here.
> 
> I have these two ontop of rc5:
> 
> cc22b9b83f6a x86/irq: Enhance __assign_irq_vector() to rollback in case of 
> failure
> 45dd79e03e1e x86/irq: Do not reuse struct apic_chip_data.old_domain as 
> temporary buffer
> 9f9499ae8e64 Linux 4.4-rc5
> 
> and my guest stalls while booting.
> 
> The new thing I see in dmesg is this:
> 
>  ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> +..MP-BIOS bug: 8254 timer not connected to IO-APIC
> +...trying to set up timer (IRQ0) through the 8259A ...
> +. (found apic 0 pin 2) ...
> +... failed.
> +...trying to set up timer as Virtual Wire IRQ...
> +. failed.
> +...trying to set up timer as ExtINT IRQ...
> +. works.
> +APIC calibration not consistent with PM-Timer: 111ms instead of 100ms
> +APIC delta adjusted to PM-Timer: 6248393 (6997337)
> 
> which leads to boot stalling and timeoutting when loading the hdd
> driver:
Hi Boris and Ying,
Aha, found a possible regression. Could you please help to
apply the attached bugfix patch ontop of "cc22b9b83f6a x86/irq:
Enhance __assign_irq_vector() to rollback in case of failure"?
Hi Ying, I have push this patch to github so it should reach
0day test farm soon:)
Thanks,
Gerry

> 
> ...
> [3.973447] console [netcon0] enabled
> [3.976099] netconsole: network logging started
> [3.979604] rtc_cmos 00:00: setting system clock to 2015-12-14 10:45:35 
> UTC (1450089935)
> [3.985348] PM: Checking hibernation image partition /dev/sdb1
> [6.600706] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
> [6.613651] usb 1-1: New USB device strings: Mfr=1, Product=3, 
> SerialNumber=5
> [6.636905] usb 1-1: Product: QEMU USB Tablet
> [6.642248] usb 1-1: Manufacturer: QEMU
> [6.647109] usb 1-1: SerialNumber: 42
> [7.580995] ata2.00: qc timeout (cmd 0xa0)
> [7.589300] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [7.750715] ata2.01: NODEV after polling detection
> [7.759605] ata2.00: configured for MWDMA2
> [8.585691] input: QEMU QEMU USB Tablet as 
> /devices/pci:00/:00:01.2/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input1
> [8.602467] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 
> Pointer [QEMU QEMU USB Tablet] on usb-:00:01.2-1/input0
> [   12.760846] ata2.00: qc timeout (cmd 0xa0)
> [   12.786543] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [   12.796576] ata2.00: limiting speed to MWDMA2:PIO3
> [   12.958455] ata2.01: NODEV after polling detection
> [   12.969693] ata2.00: configured for MWDMA2
> [   17.972782] ata2.00: qc timeout (cmd 0xa0)
> [   17.978967] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [   17.983495] ata2.00: disabled
> [   17.986352] ata2: soft resetting link
> [   18.146586] ata2.01: NODEV after polling detection
> [   18.151413] ata2: EH complete
> [   32.745227] ata1: lost interrupt (Status 0x50)
> [   32.748470] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [   32.756586] ata1.00: failed command: READ DMA
> [   32.761251] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 
> 4096 in
> [   32.761251]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [   32.773928] ata1.00: status: { DRDY }
> [   32.777028] ata1: soft resetting link
> [   32.934437] ata1.01: NODEV after polling detection
> [   32.946663] ata1.00: configured for MWDMA2
> [   32.949964] ata1.00: device reported invalid CHS sector 0
> [   32.953793] ata1: EH complete
> [   63.849089] ata1: lost interrupt (Status 0x50)
> [   63.857470] ata1.00: limiting speed to MWDMA1:PIO4
> [   63.860982] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
> frozen
> [   63.865862] ata1.00: failed command: READ DMA
> [   63.883697] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 
> 4096 in
> [   63.883697]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [   63.899573] ata1.00: status: { DRDY }
> [   63.902649] ata1: soft resetting link
> [   64.062580] ata1.01: NODEV after polling detection
> [   64.073800] ata1.00: configured for MWDMA1
> [   64.076813] ata1.00: device reported invalid CHS sector 0
> [   64.096188] ata1: EH complete
> 
>From c7c3cc3a048576fd1e196e67b11ae0193e7fba1e Mon Sep 17 00:00:00 2001
From: Jiang Liu <jiang@linux.intel.com>
Date: Tue, 15 Dec 2015 15:40:43 +0800
Subject: [PATCH]


Signed-off-by: Jiang Liu <jiang.

Re: [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

2015-12-13 Thread Jiang Liu
Hi Ying,
Thanks for reporting this issue. But I couldn't figure
out what's wrong with this commit. And there's no error or
warning messages in the attached dmesg file. Are there other
systems reporting the same issue?
Thanks,
Gerry

On 2015/12/11 15:49, kernel test robot wrote:
> FYI, we noticed the below changes on
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
> commit 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183 ("x86/irq: Enhance 
> __assign_irq_vector() to rollback in case of failure")
> 
> 
> ++++
> || 6dd7cb991f | 4c24cee6b2 |
> ++++
> | boot_successes | 6  | 0  |
> | boot_failures  | 0  | 8  |
> | IP-Config:Auto-configuration_of_network_failed | 0  | 6  |
> | BUG:kernel_boot_hang   | 0  | 2  |
> ++++
> 
> It appears that the Ethernet card doesn't work properly after your patch.
> 
> [   15.342990] Waiting up to 110 more seconds for network.
> [   25.346987] Waiting up to 100 more seconds for network.
> [   35.350995] Waiting up to 90 more seconds for network.
> [   45.350993] Waiting up to 80 more seconds for network.
> [   55.351006] Waiting up to 70 more seconds for network.
> [   65.350992] Waiting up to 60 more seconds for network.
> [   75.355017] Waiting up to 50 more seconds for network.
> [   85.359009] Waiting up to 40 more seconds for network.
> [   95.363009] Waiting up to 30 more seconds for network.
> [  305.883015] Waiting up to 20 more seconds for network.
> [  315.887002] Waiting up to 10 more seconds for network.
> [  325.887524] Sending DHCP requests .. timed out!
> [  417.893036] IP-Config: Auto-configuration of network failed
> [  417.893852] ALSA device list:
> [  417.894270]   No soundcards found.
> [  417.899649] Freeing unused kernel memory: 2884K (82574000 - 
> 82845000)
> 
> 
> Thanks,
> Ying Huang
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

2015-12-13 Thread Jiang Liu
Hi Ying,
Thanks for reporting this issue. But I couldn't figure
out what's wrong with this commit. And there's no error or
warning messages in the attached dmesg file. Are there other
systems reporting the same issue?
Thanks,
Gerry

On 2015/12/11 15:49, kernel test robot wrote:
> FYI, we noticed the below changes on
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
> commit 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183 ("x86/irq: Enhance 
> __assign_irq_vector() to rollback in case of failure")
> 
> 
> ++++
> || 6dd7cb991f | 4c24cee6b2 |
> ++++
> | boot_successes | 6  | 0  |
> | boot_failures  | 0  | 8  |
> | IP-Config:Auto-configuration_of_network_failed | 0  | 6  |
> | BUG:kernel_boot_hang   | 0  | 2  |
> ++++
> 
> It appears that the Ethernet card doesn't work properly after your patch.
> 
> [   15.342990] Waiting up to 110 more seconds for network.
> [   25.346987] Waiting up to 100 more seconds for network.
> [   35.350995] Waiting up to 90 more seconds for network.
> [   45.350993] Waiting up to 80 more seconds for network.
> [   55.351006] Waiting up to 70 more seconds for network.
> [   65.350992] Waiting up to 60 more seconds for network.
> [   75.355017] Waiting up to 50 more seconds for network.
> [   85.359009] Waiting up to 40 more seconds for network.
> [   95.363009] Waiting up to 30 more seconds for network.
> [  305.883015] Waiting up to 20 more seconds for network.
> [  315.887002] Waiting up to 10 more seconds for network.
> [  325.887524] Sending DHCP requests .. timed out!
> [  417.893036] IP-Config: Auto-configuration of network failed
> [  417.893852] ALSA device list:
> [  417.894270]   No soundcards found.
> [  417.899649] Freeing unused kernel memory: 2884K (82574000 - 
> 82845000)
> 
> 
> Thanks,
> Ying Huang
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/apic] x86/irq: Trivial cleanups for x86 vector allocation code

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  27dd9e6098141a9ebaafe48d50277fcae6e09775
Gitweb: http://git.kernel.org/tip/27dd9e6098141a9ebaafe48d50277fcae6e09775
Author: Jiang Liu 
AuthorDate: Mon, 30 Nov 2015 16:09:30 +0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 10 Dec 2015 19:39:57 +0100

x86/irq: Trivial cleanups for x86 vector allocation code

Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu 
Cc: Joe Lawrence 
Link: 
http://lkml.kernel.org/r/1448870970-1461-5-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/vector.c | 54 ++-
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b63d6f8..0183c44 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@
 
 struct apic_chip_data {
struct irq_cfg  cfg;
+   u8  move_in_progress : 1;
cpumask_var_t   domain;
cpumask_var_t   old_domain;
-   u8  move_in_progress : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 
 void lock_vector_lock(void)
 {
-   /* Used to the online set of cpus does not change
+   /* Used to ensure that the online set of cpus does not change
 * during assign_irq_vector.
 */
raw_spin_lock(_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
 }
 
-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
-  const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 */
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
-   int cpu, err;
+   int cpu, err = -EBUSY;
+   struct irq_desc *desc = irq_data_to_desc(data);
+   struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+   unsigned long flags;
 
+   raw_spin_lock_irqsave(_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
-   return -EBUSY;
+   goto out;
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -187,7 +190,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
-   per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+   per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,37 +201,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
-   return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
-const struct cpumask *mask)
-{
-   int err;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(_lock, flags);
+
return err;
 }
 
-static int assign_irq_vector_policy(int irq, int node,
-   struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
 {
if (info && info->mask)
-   return assign_irq_vector(irq, data, info->mask);
+   return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
-   assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+   assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
-   return assign_irq_vector(irq, data, apic->target_cpus());
+   return assign_irq_vector(data, apic->target_cpus());
 }
 
-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
 {
-   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_desc *desc = irq_data_to_desc(irq_data);
+   struct apic_chip_data *data = irq_data->chip_data;
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -276,7 +269,7 @@ static void x86_vector_free_irqs(struct irq_do

[tip:x86/urgent] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20
Gitweb: http://git.kernel.org/tip/6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20
Author: Jiang Liu 
AuthorDate: Mon, 30 Nov 2015 16:09:26 +0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence 
Signed-off-by: Jiang Liu 
Link: 
http://lkml.kernel.org/r/1448870970-1461-1-git-send-email-jiang@linux.intel.com
Cc: sta...@vger.kernel.org
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/vector.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59..d6ec36b 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+   cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain,
-  vector_cpumask);
-   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
 
BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
+   BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
 
return arch_early_ioapic_init();
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/irq: Fix a race condition between vector assigning and cleanup

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  41c7518a5d14543fa4aa1b5b9994ac26b38c0406
Gitweb: http://git.kernel.org/tip/41c7518a5d14543fa4aa1b5b9994ac26b38c0406
Author: Jiang Liu 
AuthorDate: Mon, 30 Nov 2015 16:09:29 +0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Fix a race condition between vector assigning and cleanup

Joe Lawrence reported an use after release issue related to x86 IRQ
management code. Please refer to the following link for more
information: http://lkml.kernel.org/r/5653b688.4050...@stratus.com

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence 
Signed-off-by: Jiang Liu 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1448870970-1461-4-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/vector.c | 77 +++
 1 file changed, 34 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 57934ef..b63d6f8 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,9 +117,9 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
-   unsigned int dest = d->cfg.dest_apicid;
+   unsigned int dest;
 
-   if (d->move_in_progress)
+   if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
 
/* Only try and allocate irqs on cpus that are present */
@@ -144,13 +144,12 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
   );
-   if (err)
-   break;
-   cpumask_andnot(d->old_domain, d->domain,
-  vector_cpumask);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_copy(d->domain, used_cpumask);
+   if (!err) {
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
+   d->cfg.dest_apicid = dest;
+   }
break;
}
 
@@ -183,14 +182,12 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
-   if (d->cfg.vector) {
+   if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   }
+   d->cfg.vector = vector;
+   d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
-   d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,7 +195,8 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   d->cfg.dest_apicid = dest;
+   cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+   d->move_in_progress = !cpumask_empty(d->old_domain);
}
 
return err;
@@ -230,7 +228,7 @@ static int assign_irq_vector_policy(int irq, int node,
 
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
-   struct irq_desc *desc;
+   struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -239,10 +237,6 @@ static void clear_irq_vector(int irq, struct 
apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress))
-   return;
-
-   desc = 

[tip:x86/urgent] x86/irq: Fix a race window in x86_vector_free_irqs()

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  21a1b3bf35018b446c943c15f0a6225e6f6497ae
Gitweb: http://git.kernel.org/tip/21a1b3bf35018b446c943c15f0a6225e6f6497ae
Author: Jiang Liu 
AuthorDate: Mon, 30 Nov 2015 16:09:28 +0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Fix a race window in x86_vector_free_irqs()

There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
x   //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->   // may access freed memory
raw_spin_unlock(>lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence 
Signed-off-by: Jiang Liu 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1448870970-1461-3-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/vector.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f03957e..57934ef 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -231,23 +231,16 @@ static int assign_irq_vector_policy(int irq, int node,
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
struct irq_desc *desc;
-   unsigned long flags;
-   int cpu, vector;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   BUG_ON(!data->cfg.vector);
+   int cpu, vector = data->cfg.vector;
 
-   vector = data->cfg.vector;
+   BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress)) {
-   raw_spin_unlock_irqrestore(_lock, flags);
+   if (likely(!data->move_in_progress))
return;
-   }
 
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -260,7 +253,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data 
*data)
}
}
data->move_in_progress = 0;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   cpumask_clear(data->old_domain);
 }
 
 void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -282,18 +275,21 @@ static void x86_vector_free_irqs(struct irq_domain 
*domain,
 unsigned int virq, unsigned int nr_irqs)
 {
struct irq_data *irq_data;
+   unsigned long flags;
int i;
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+   raw_spin_lock_irqsave(_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+   irq_domain_reset_irq_data(irq_data);
+   raw_spin_unlock_irqrestore(_lock, flags);
 #ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
 #endif
-   irq_domain_reset_irq_data(irq_data);
}
}
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  4c24cee6b2aeaee3dab896f76fef4fe79d9e4183
Gitweb: http://git.kernel.org/tip/4c24cee6b2aeaee3dab896f76fef4fe79d9e4183
Author: Jiang Liu 
AuthorDate: Mon, 30 Nov 2015 16:09:27 +0800
Committer:  Thomas Gleixner 
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence 
Signed-off-by: Jiang Liu 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1448870970-1461-2-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/vector.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b..f03957e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+   unsigned int dest = d->cfg.dest_apicid;
 
if (d->move_in_progress)
return -EBUSY;
@@ -140,11 +141,16 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
+   cpumask_and(used_cpumask, d->domain, vector_cpumask);
+   err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+  );
+   if (err)
+   break;
cpumask_andnot(d->old_domain, d->domain,
   vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
break;
}
 
@@ -167,11 +173,13 @@ next:
 
if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
}
+   if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, ))
+   goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +198,7 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   err = apic->cpu_mask_to_apicid_and(mask, d->domain,
-  >cfg.dest_apicid);
+   d->cfg.dest_apicid = dest;
}
 
return err;
@@ -493,14 +500,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
 
err = assign_irq_vector(irq, data, dest);
-   if (err) {
-   if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
 
-   return IRQ_SET_MASK_OK;
+   return err ? err : IRQ_SET_MASK_OK;
 }
 
 static struct irq_chip lapic_controller = {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  4c24cee6b2aeaee3dab896f76fef4fe79d9e4183
Gitweb: http://git.kernel.org/tip/4c24cee6b2aeaee3dab896f76fef4fe79d9e4183
Author: Jiang Liu <jiang@linux.intel.com>
AuthorDate: Mon, 30 Nov 2015 16:09:27 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence <joe.lawre...@stratus.com>
Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1448870970-1461-2-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/apic/vector.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b..f03957e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+   unsigned int dest = d->cfg.dest_apicid;
 
if (d->move_in_progress)
return -EBUSY;
@@ -140,11 +141,16 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
+   cpumask_and(used_cpumask, d->domain, vector_cpumask);
+   err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+  );
+   if (err)
+   break;
cpumask_andnot(d->old_domain, d->domain,
   vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
break;
}
 
@@ -167,11 +173,13 @@ next:
 
if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
}
+   if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, ))
+   goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +198,7 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   err = apic->cpu_mask_to_apicid_and(mask, d->domain,
-  >cfg.dest_apicid);
+   d->cfg.dest_apicid = dest;
}
 
return err;
@@ -493,14 +500,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
 
err = assign_irq_vector(irq, data, dest);
-   if (err) {
-   if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
 
-   return IRQ_SET_MASK_OK;
+   return err ? err : IRQ_SET_MASK_OK;
 }
 
 static struct irq_chip lapic_controller = {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/irq: Fix a race window in x86_vector_free_irqs()

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  21a1b3bf35018b446c943c15f0a6225e6f6497ae
Gitweb: http://git.kernel.org/tip/21a1b3bf35018b446c943c15f0a6225e6f6497ae
Author: Jiang Liu <jiang@linux.intel.com>
AuthorDate: Mon, 30 Nov 2015 16:09:28 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Fix a race window in x86_vector_free_irqs()

There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
x   //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->   // may access freed memory
raw_spin_unlock(>lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence <joe.lawre...@stratus.com>
Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1448870970-1461-3-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/apic/vector.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f03957e..57934ef 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -231,23 +231,16 @@ static int assign_irq_vector_policy(int irq, int node,
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
struct irq_desc *desc;
-   unsigned long flags;
-   int cpu, vector;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   BUG_ON(!data->cfg.vector);
+   int cpu, vector = data->cfg.vector;
 
-   vector = data->cfg.vector;
+   BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress)) {
-   raw_spin_unlock_irqrestore(_lock, flags);
+   if (likely(!data->move_in_progress))
return;
-   }
 
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -260,7 +253,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data 
*data)
}
}
data->move_in_progress = 0;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   cpumask_clear(data->old_domain);
 }
 
 void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -282,18 +275,21 @@ static void x86_vector_free_irqs(struct irq_domain 
*domain,
 unsigned int virq, unsigned int nr_irqs)
 {
struct irq_data *irq_data;
+   unsigned long flags;
int i;
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+   raw_spin_lock_irqsave(_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+   irq_domain_reset_irq_data(irq_data);
+   raw_spin_unlock_irqrestore(_lock, flags);
 #ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
 #endif
-   irq_domain_reset_irq_data(irq_data);
}
}
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/irq: Fix a race condition between vector assigning and cleanup

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  41c7518a5d14543fa4aa1b5b9994ac26b38c0406
Gitweb: http://git.kernel.org/tip/41c7518a5d14543fa4aa1b5b9994ac26b38c0406
Author: Jiang Liu <jiang@linux.intel.com>
AuthorDate: Mon, 30 Nov 2015 16:09:29 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Fix a race condition between vector assigning and cleanup

Joe Lawrence reported an use after release issue related to x86 IRQ
management code. Please refer to the following link for more
information: http://lkml.kernel.org/r/5653b688.4050...@stratus.com

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence <joe.lawre...@stratus.com>
Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1448870970-1461-4-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/apic/vector.c | 77 +++
 1 file changed, 34 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 57934ef..b63d6f8 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,9 +117,9 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
-   unsigned int dest = d->cfg.dest_apicid;
+   unsigned int dest;
 
-   if (d->move_in_progress)
+   if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
 
/* Only try and allocate irqs on cpus that are present */
@@ -144,13 +144,12 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
   );
-   if (err)
-   break;
-   cpumask_andnot(d->old_domain, d->domain,
-  vector_cpumask);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_copy(d->domain, used_cpumask);
+   if (!err) {
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
+   d->cfg.dest_apicid = dest;
+   }
break;
}
 
@@ -183,14 +182,12 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
-   if (d->cfg.vector) {
+   if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   }
+   d->cfg.vector = vector;
+   d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
-   d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,7 +195,8 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   d->cfg.dest_apicid = dest;
+   cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+   d->move_in_progress = !cpumask_empty(d->old_domain);
}
 
return err;
@@ -230,7 +228,7 @@ static int assign_irq_vector_policy(int irq, int node,
 
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
-   struct irq_desc *desc;
+   struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -239,10 +237,6 @@ static void clear_irq_vector(int irq, struct 
apic_chip_data *data)
data->cf

[tip:x86/apic] x86/irq: Trivial cleanups for x86 vector allocation code

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  27dd9e6098141a9ebaafe48d50277fcae6e09775
Gitweb: http://git.kernel.org/tip/27dd9e6098141a9ebaafe48d50277fcae6e09775
Author: Jiang Liu <jiang@linux.intel.com>
AuthorDate: Mon, 30 Nov 2015 16:09:30 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Thu, 10 Dec 2015 19:39:57 +0100

x86/irq: Trivial cleanups for x86 vector allocation code

Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Cc: Joe Lawrence <joe.lawre...@stratus.com>
Link: 
http://lkml.kernel.org/r/1448870970-1461-5-git-send-email-jiang@linux.intel.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/apic/vector.c | 54 ++-
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b63d6f8..0183c44 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@
 
 struct apic_chip_data {
struct irq_cfg  cfg;
+   u8  move_in_progress : 1;
cpumask_var_t   domain;
cpumask_var_t   old_domain;
-   u8  move_in_progress : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 
 void lock_vector_lock(void)
 {
-   /* Used to the online set of cpus does not change
+   /* Used to ensure that the online set of cpus does not change
 * during assign_irq_vector.
 */
raw_spin_lock(_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
 }
 
-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
-  const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 */
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
-   int cpu, err;
+   int cpu, err = -EBUSY;
+   struct irq_desc *desc = irq_data_to_desc(data);
+   struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+   unsigned long flags;
 
+   raw_spin_lock_irqsave(_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
-   return -EBUSY;
+   goto out;
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -187,7 +190,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
-   per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+   per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,37 +201,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
-   return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
-const struct cpumask *mask)
-{
-   int err;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(_lock, flags);
+
return err;
 }
 
-static int assign_irq_vector_policy(int irq, int node,
-   struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
 {
if (info && info->mask)
-   return assign_irq_vector(irq, data, info->mask);
+   return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
-   assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+   assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
-   return assign_irq_vector(irq, data, apic->target_cpus());
+   return assign_irq_vector(data, apic->target_cpus());
 }
 
-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
 {
-   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_desc *desc = irq_data_to_desc(irq_data);
+   struct apic_chip_data *data = irq_data->chip_da

[tip:x86/urgent] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-12-10 Thread tip-bot for Jiang Liu
Commit-ID:  6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20
Gitweb: http://git.kernel.org/tip/6dd7cb991fcbfef55d8bf3d22b8a87f9d5007e20
Author: Jiang Liu <jiang@linux.intel.com>
AuthorDate: Mon, 30 Nov 2015 16:09:26 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Thu, 10 Dec 2015 19:32:07 +0100

x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Fixes: a782a7e46bb5 "x86/irq: Store irq descriptor in vector array"
Reported-and-tested-by: Joe Lawrence <joe.lawre...@stratus.com>
Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Link: 
http://lkml.kernel.org/r/1448870970-1461-1-git-send-email-jiang@linux.intel.com
Cc: sta...@vger.kernel.org
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/apic/vector.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59..d6ec36b 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+   cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain,
-  vector_cpumask);
-   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
 
BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
+   BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
 
return arch_early_ioapic_init();
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 4/7] PCI: Add fwnode_handle to pci_sysdata

2015-12-02 Thread Jiang Liu
On 2015/11/3 5:33, ja...@microsoft.com wrote:
> From: Jake Oshins 
> 
> This patch adds an fwnode_handle to struct pci_sysdata, which is
> used by the next patch in the series when trying to locate an
> IRQ domain associated with a root PCI bus.
> 
> Signed-off-by: Jake Oshins 
> ---
>  arch/x86/include/asm/pci.h | 13 +
>  include/asm-generic/pci.h  |  4 
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 4625943..fb74453 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -20,6 +20,9 @@ struct pci_sysdata {
>  #ifdef CONFIG_X86_64
>   void*iommu; /* IOMMU private data */
>  #endif
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> + void*fwnode;/* IRQ domain for MSI assignment */
> +#endif
>  };
>  
>  extern int pci_routeirq;
> @@ -41,6 +44,16 @@ static inline int pci_proc_domain(struct pci_bus *bus)
>  }
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +static inline void *_pci_root_bus_fwnode(struct pci_bus *bus)
> +{
> + struct pci_sysdata *sd = bus->sysdata;
> + return sd->fwnode;
> +}
> +
> +#define pci_root_bus_fwnode  _pci_root_bus_fwnode
> +#endif
> +
>  /* Can be used to override the logic in pci_scan_bus for skipping
> already-configured bus numbers - to be used for buggy BIOSes
> or architectures with incomplete PCI setup by the loader */
> diff --git a/include/asm-generic/pci.h b/include/asm-generic/pci.h
> index f24bc51..3fde985 100644
> --- a/include/asm-generic/pci.h
> +++ b/include/asm-generic/pci.h
> @@ -21,4 +21,8 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev 
> *dev, int channel)
>  #define PCI_DMA_BUS_IS_PHYS  (1)
>  #endif
>  
> +#ifndef pci_root_bus_fwnode
> +#define pci_root_bus_fwnode(bus) ((void)(bus),NULL)
> +#endif
Hi Jakeo,
For x86, all PCI devices share the same MSI controller. But I'm
not sure whether it may have per-bus/per-device MSI controllers on other
archs. If there may be multiple MSI controllers serving PCI devices
under the same PCI root, it would be better to use some thing like
pci_get_msi_fwnode(bus) or similar.
Thanks,
Gerry


> +
>  #endif /* _ASM_GENERIC_PCI_H */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 7/7] PCI: hv: New paravirtual PCI front-end for Hyper-V VMs

2015-12-02 Thread Jiang Liu
On 2015/11/3 5:33, ja...@microsoft.com wrote:
> From: Jake Oshins 
> 
> This patch introduces a new driver which exposes a root PCI bus whenever a PCI
> Express device is passed through to a guest VM under Hyper-V. The device can
> be single- or multi-function. The interrupts for the devices are managed by an
> IRQ domain, implemented within the driver.
> 
> Signed-off-by: Jake Oshins 
> ---
>  MAINTAINERS|1 +
>  drivers/pci/Kconfig|7 +
>  drivers/pci/host/Makefile  |1 +
>  drivers/pci/host/hv_pcifront.c | 2267 
> 
>  4 files changed, 2276 insertions(+)
>  create mode 100644 drivers/pci/host/hv_pcifront.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a2d50fe..a1205b4 100644
[...]
> +/* Interrupt management hooks */
> +
> +/**
> + * hv_msi_free() - Free the MSI.
> + * @domain:  The interrupt domain pointer
> + * @info:Extra MSI-related context
> + * @irq: Identifies the IRQ.
> + *
> + * The Hyper-V parent partition and hypervisor are tracking the
> + * messages that are in use, keeping the interrupt redirection
> + * table up to date.  This callback sends a message that frees
> + * the the IRT entry and related tracking nonsense.
> + */
> +static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info 
> *info,
> + unsigned int irq)
> +{
> + struct pci_delete_interrupt *int_pkt;
> + struct {
> + struct pci_packet pkt;
> + u8 buffer[sizeof(struct pci_delete_interrupt) -
> +   sizeof(struct pci_message)];
> + } ctxt;
> + struct hv_pcibus_device *hbus;
> + struct hv_pci_dev *hpdev;
> + struct irq_desc *desc;
> + struct msi_desc *msi;
> + struct tran_int_desc *int_desc;
> + struct pci_dev *pdev;
> +
> + desc = irq_to_desc(irq);
> + msi = irq_desc_get_msi_desc(desc);
> + pdev = msi_desc_to_pci_dev(msi);
For safety, don't assume this HV MSI irqdomain is the top domain.
So please use:
struct irq_data *irq_data = irq_domain_get_irq_data(domain, irq);
struct msi_desc *desc = irq_data_get_msi_desc(irq_data);
struct tran_int_desc *int_desc = irq_data_get_chip_data(irq_data);

> + hbus = info->data;
> + hpdev = lookup_hv_dev(hbus, devfn_to_wslot(pdev->devfn));
> + if (!hpdev)
> + return;
> +
> + int_desc = irq_get_chip_data(irq);
> + if (int_desc) {
> + memset(, 0, sizeof(ctxt));
> + int_pkt = (struct pci_delete_interrupt *)
> + int_pkt->message_type.message_type =
> + PCI_DELETE_INTERRUPT_MESSAGE;
> + int_pkt->wslot.slot = hpdev->desc.win_slot.slot;
> + int_pkt->int_desc = *int_desc;
> + vmbus_sendpacket(hbus->hdev->channel, int_pkt, sizeof(*int_pkt),
> +  (unsigned long), VM_PKT_DATA_INBAND,
> +  0);
> + desc->irq_data.chip_data = NULL;
> + kfree(int_desc);
> + }
> +
> + hv_pcichild_dec(hpdev, hv_pcidev_ref_by_slot);
> +}
> +
> +static int hv_set_affinity(struct irq_data *data, const struct cpumask *dest,
> +bool force)
> +{
> + struct irq_data *parent = data->parent_data;
> +
> + return parent->chip->irq_set_affinity(parent, dest, force);
> +}
> +
> +void hv_irq_mask(struct irq_data *data)
> +{
> + pci_msi_mask_irq(data);
> +}
> +
> +/**
> + * hv_irq_unmask() - "Unmask" the IRQ by setting its current
> + * affinity.
> + * @data:Describes the IRQ
> + *
> + * Build new a destination for the MSI and make a hypercall to
> + * update the Interrupt Redirection Table. "Device Logical ID"
> + * is built out of this PCI bus's instance GUID and the function
> + * number of the device.
> + */
> +void hv_irq_unmask(struct irq_data *data)
> +{
> + struct msi_desc *msi_desc = irq_data_get_msi_desc(data);
> + struct irq_cfg *cfg = irqd_cfg(data);
> + struct retarget_msi_interrupt params;
> + struct hv_pcibus_device *hbus;
> + struct cpumask *dest;
> + struct pci_bus *pbus;
> + struct pci_dev *pdev;
> + int cpu;
> +
> + dest = irq_data_get_affinity_mask(data);
> + pdev = msi_desc_to_pci_dev(msi_desc);
> + pbus = pdev->bus;
> + hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
> +
> + memset(, 0, sizeof(params));
> + params.partition_id = HV_PARTITION_ID_SELF;
> + params.source = 1; /* MSI(-X) */
> + params.address = msi_desc->msg.address_lo;
> + params.data = msi_desc->msg.data;
> + params.device_id = (hbus->hdev->dev_instance.b[5] << 24) |
> +(hbus->hdev->dev_instance.b[4] << 16) |
> +(hbus->hdev->dev_instance.b[7] << 8) |
> +(hbus->hdev->dev_instance.b[6] & 0xf8) |
> +PCI_FUNC(pdev->devfn);
> + params.vector = cfg->vector;
> +
> + for_each_cpu_and(cpu, dest, 

Re: [PATCH v6 4/7] PCI: Add fwnode_handle to pci_sysdata

2015-12-02 Thread Jiang Liu
On 2015/11/3 5:33, ja...@microsoft.com wrote:
> From: Jake Oshins 
> 
> This patch adds an fwnode_handle to struct pci_sysdata, which is
> used by the next patch in the series when trying to locate an
> IRQ domain associated with a root PCI bus.
> 
> Signed-off-by: Jake Oshins 
> ---
>  arch/x86/include/asm/pci.h | 13 +
>  include/asm-generic/pci.h  |  4 
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 4625943..fb74453 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -20,6 +20,9 @@ struct pci_sysdata {
>  #ifdef CONFIG_X86_64
>   void*iommu; /* IOMMU private data */
>  #endif
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> + void*fwnode;/* IRQ domain for MSI assignment */
> +#endif
>  };
>  
>  extern int pci_routeirq;
> @@ -41,6 +44,16 @@ static inline int pci_proc_domain(struct pci_bus *bus)
>  }
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +static inline void *_pci_root_bus_fwnode(struct pci_bus *bus)
> +{
> + struct pci_sysdata *sd = bus->sysdata;
> + return sd->fwnode;
> +}
> +
> +#define pci_root_bus_fwnode  _pci_root_bus_fwnode
> +#endif
> +
>  /* Can be used to override the logic in pci_scan_bus for skipping
> already-configured bus numbers - to be used for buggy BIOSes
> or architectures with incomplete PCI setup by the loader */
> diff --git a/include/asm-generic/pci.h b/include/asm-generic/pci.h
> index f24bc51..3fde985 100644
> --- a/include/asm-generic/pci.h
> +++ b/include/asm-generic/pci.h
> @@ -21,4 +21,8 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev 
> *dev, int channel)
>  #define PCI_DMA_BUS_IS_PHYS  (1)
>  #endif
>  
> +#ifndef pci_root_bus_fwnode
> +#define pci_root_bus_fwnode(bus) ((void)(bus),NULL)
> +#endif
Hi Jakeo,
For x86, all PCI devices share the same MSI controller. But I'm
not sure whether it may have per-bus/per-device MSI controllers on other
archs. If there may be multiple MSI controllers serving PCI devices
under the same PCI root, it would be better to use some thing like
pci_get_msi_fwnode(bus) or similar.
Thanks,
Gerry


> +
>  #endif /* _ASM_GENERIC_PCI_H */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 7/7] PCI: hv: New paravirtual PCI front-end for Hyper-V VMs

2015-12-02 Thread Jiang Liu
On 2015/11/3 5:33, ja...@microsoft.com wrote:
> From: Jake Oshins 
> 
> This patch introduces a new driver which exposes a root PCI bus whenever a PCI
> Express device is passed through to a guest VM under Hyper-V. The device can
> be single- or multi-function. The interrupts for the devices are managed by an
> IRQ domain, implemented within the driver.
> 
> Signed-off-by: Jake Oshins 
> ---
>  MAINTAINERS|1 +
>  drivers/pci/Kconfig|7 +
>  drivers/pci/host/Makefile  |1 +
>  drivers/pci/host/hv_pcifront.c | 2267 
> 
>  4 files changed, 2276 insertions(+)
>  create mode 100644 drivers/pci/host/hv_pcifront.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a2d50fe..a1205b4 100644
[...]
> +/* Interrupt management hooks */
> +
> +/**
> + * hv_msi_free() - Free the MSI.
> + * @domain:  The interrupt domain pointer
> + * @info:Extra MSI-related context
> + * @irq: Identifies the IRQ.
> + *
> + * The Hyper-V parent partition and hypervisor are tracking the
> + * messages that are in use, keeping the interrupt redirection
> + * table up to date.  This callback sends a message that frees
> + * the the IRT entry and related tracking nonsense.
> + */
> +static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info 
> *info,
> + unsigned int irq)
> +{
> + struct pci_delete_interrupt *int_pkt;
> + struct {
> + struct pci_packet pkt;
> + u8 buffer[sizeof(struct pci_delete_interrupt) -
> +   sizeof(struct pci_message)];
> + } ctxt;
> + struct hv_pcibus_device *hbus;
> + struct hv_pci_dev *hpdev;
> + struct irq_desc *desc;
> + struct msi_desc *msi;
> + struct tran_int_desc *int_desc;
> + struct pci_dev *pdev;
> +
> + desc = irq_to_desc(irq);
> + msi = irq_desc_get_msi_desc(desc);
> + pdev = msi_desc_to_pci_dev(msi);
For safety, don't assume this HV MSI irqdomain is the top domain.
So please use:
struct irq_data *irq_data = irq_domain_get_irq_data(domain, irq);
struct msi_desc *desc = irq_data_get_msi_desc(irq_data);
struct tran_int_desc *int_desc = irq_data_get_chip_data(irq_data);

> + hbus = info->data;
> + hpdev = lookup_hv_dev(hbus, devfn_to_wslot(pdev->devfn));
> + if (!hpdev)
> + return;
> +
> + int_desc = irq_get_chip_data(irq);
> + if (int_desc) {
> + memset(, 0, sizeof(ctxt));
> + int_pkt = (struct pci_delete_interrupt *)
> + int_pkt->message_type.message_type =
> + PCI_DELETE_INTERRUPT_MESSAGE;
> + int_pkt->wslot.slot = hpdev->desc.win_slot.slot;
> + int_pkt->int_desc = *int_desc;
> + vmbus_sendpacket(hbus->hdev->channel, int_pkt, sizeof(*int_pkt),
> +  (unsigned long), VM_PKT_DATA_INBAND,
> +  0);
> + desc->irq_data.chip_data = NULL;
> + kfree(int_desc);
> + }
> +
> + hv_pcichild_dec(hpdev, hv_pcidev_ref_by_slot);
> +}
> +
> +static int hv_set_affinity(struct irq_data *data, const struct cpumask *dest,
> +bool force)
> +{
> + struct irq_data *parent = data->parent_data;
> +
> + return parent->chip->irq_set_affinity(parent, dest, force);
> +}
> +
> +void hv_irq_mask(struct irq_data *data)
> +{
> + pci_msi_mask_irq(data);
> +}
> +
> +/**
> + * hv_irq_unmask() - "Unmask" the IRQ by setting its current
> + * affinity.
> + * @data:Describes the IRQ
> + *
> + * Build new a destination for the MSI and make a hypercall to
> + * update the Interrupt Redirection Table. "Device Logical ID"
> + * is built out of this PCI bus's instance GUID and the function
> + * number of the device.
> + */
> +void hv_irq_unmask(struct irq_data *data)
> +{
> + struct msi_desc *msi_desc = irq_data_get_msi_desc(data);
> + struct irq_cfg *cfg = irqd_cfg(data);
> + struct retarget_msi_interrupt params;
> + struct hv_pcibus_device *hbus;
> + struct cpumask *dest;
> + struct pci_bus *pbus;
> + struct pci_dev *pdev;
> + int cpu;
> +
> + dest = irq_data_get_affinity_mask(data);
> + pdev = msi_desc_to_pci_dev(msi_desc);
> + pbus = pdev->bus;
> + hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
> +
> + memset(, 0, sizeof(params));
> + params.partition_id = HV_PARTITION_ID_SELF;
> + params.source = 1; /* MSI(-X) */
> + params.address = msi_desc->msg.address_lo;
> + params.data = msi_desc->msg.data;
> + params.device_id = (hbus->hdev->dev_instance.b[5] << 24) |
> +(hbus->hdev->dev_instance.b[4] << 16) |
> +(hbus->hdev->dev_instance.b[7] << 8) |
> +(hbus->hdev->dev_instance.b[6] & 0xf8) |
> +PCI_FUNC(pdev->devfn);
> + params.vector = cfg->vector;
> +
> +  

[Bugfix 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

2015-11-30 Thread Jiang Liu
Joe Lawrence  reported an use after release
issue related to x86 IRQ management code. Please refer to following
link for more information:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1026840.html

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   77 ++---
 1 file changed, 34 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 57934ef1d032..b63d6f84c0bb 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,9 +117,9 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
-   unsigned int dest = d->cfg.dest_apicid;
+   unsigned int dest;
 
-   if (d->move_in_progress)
+   if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
 
/* Only try and allocate irqs on cpus that are present */
@@ -144,13 +144,12 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
   );
-   if (err)
-   break;
-   cpumask_andnot(d->old_domain, d->domain,
-  vector_cpumask);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_copy(d->domain, used_cpumask);
+   if (!err) {
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
+   d->cfg.dest_apicid = dest;
+   }
break;
}
 
@@ -183,14 +182,12 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
-   if (d->cfg.vector) {
+   if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   }
+   d->cfg.vector = vector;
+   d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
-   d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,7 +195,8 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   d->cfg.dest_apicid = dest;
+   cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+   d->move_in_progress = !cpumask_empty(d->old_domain);
}
 
return err;
@@ -230,7 +228,7 @@ static int assign_irq_vector_policy(int irq, int node,
 
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
-   struct irq_desc *desc;
+   struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -239,10 +237,6 @@ static void clear_irq_vector(int irq, struct 
apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress))
-   return;
-
-   desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
 vector++) {
@@ -424,10 +418,13 @@ static void __setup_vector_irq(int cpu)
struct irq_data *idata = irq_desc_get_irq_data(desc);
 
data = apic_chip_data(idata);
-   if (!data || !cpumask_test_cpu(cpu, data->domain))
-   continue;
-   vector = data->cfg.vector;
-   per_cpu(vector_irq

[Bugfix 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

2015-11-30 Thread Jiang Liu
Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b4461e..f03957e7c50d 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+   unsigned int dest = d->cfg.dest_apicid;
 
if (d->move_in_progress)
return -EBUSY;
@@ -140,11 +141,16 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
+   cpumask_and(used_cpumask, d->domain, vector_cpumask);
+   err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+  );
+   if (err)
+   break;
cpumask_andnot(d->old_domain, d->domain,
   vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
break;
}
 
@@ -167,11 +173,13 @@ next:
 
if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
}
+   if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, ))
+   goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +198,7 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   err = apic->cpu_mask_to_apicid_and(mask, d->domain,
-  >cfg.dest_apicid);
+   d->cfg.dest_apicid = dest;
}
 
return err;
@@ -493,14 +500,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
 
err = assign_irq_vector(irq, data, dest);
-   if (err) {
-   if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
 
-   return IRQ_SET_MASK_OK;
+   return err ? err : IRQ_SET_MASK_OK;
 }
 
 static struct irq_chip lapic_controller = {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix 3/5] x86/irq: Fix a race window in x86_vector_free_irqs()

2015-11-30 Thread Jiang Liu
There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
x   //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->   // may access freed memory
raw_spin_unlock(>lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f03957e7c50d..57934ef1d032 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -231,23 +231,16 @@ static int assign_irq_vector_policy(int irq, int node,
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
struct irq_desc *desc;
-   unsigned long flags;
-   int cpu, vector;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   BUG_ON(!data->cfg.vector);
+   int cpu, vector = data->cfg.vector;
 
-   vector = data->cfg.vector;
+   BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress)) {
-   raw_spin_unlock_irqrestore(_lock, flags);
+   if (likely(!data->move_in_progress))
return;
-   }
 
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -260,7 +253,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data 
*data)
}
}
data->move_in_progress = 0;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   cpumask_clear(data->old_domain);
 }
 
 void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -282,18 +275,21 @@ static void x86_vector_free_irqs(struct irq_domain 
*domain,
 unsigned int virq, unsigned int nr_irqs)
 {
struct irq_data *irq_data;
+   unsigned long flags;
int i;
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+   raw_spin_lock_irqsave(_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+   irq_domain_reset_irq_data(irq_data);
+   raw_spin_unlock_irqrestore(_lock, flags);
 #ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
 #endif
-   irq_domain_reset_irq_data(irq_data);
}
}
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix 5/5] x86/irq: Trivial cleanups for x86 vector allocation code

2015-11-30 Thread Jiang Liu
Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |   54 ++---
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b63d6f84c0bb..0183c44a13cb 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@
 
 struct apic_chip_data {
struct irq_cfg  cfg;
+   u8  move_in_progress : 1;
cpumask_var_t   domain;
cpumask_var_t   old_domain;
-   u8  move_in_progress : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 
 void lock_vector_lock(void)
 {
-   /* Used to the online set of cpus does not change
+   /* Used to ensure that the online set of cpus does not change
 * during assign_irq_vector.
 */
raw_spin_lock(_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
 }
 
-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
-  const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 */
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
-   int cpu, err;
+   int cpu, err = -EBUSY;
+   struct irq_desc *desc = irq_data_to_desc(data);
+   struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+   unsigned long flags;
 
+   raw_spin_lock_irqsave(_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
-   return -EBUSY;
+   goto out;
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -187,7 +190,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
-   per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+   per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,37 +201,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
-   return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
-const struct cpumask *mask)
-{
-   int err;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(_lock, flags);
+
return err;
 }
 
-static int assign_irq_vector_policy(int irq, int node,
-   struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
 {
if (info && info->mask)
-   return assign_irq_vector(irq, data, info->mask);
+   return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
-   assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+   assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
-   return assign_irq_vector(irq, data, apic->target_cpus());
+   return assign_irq_vector(data, apic->target_cpus());
 }
 
-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
 {
-   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_desc *desc = irq_data_to_desc(irq_data);
+   struct apic_chip_data *data = irq_data->chip_data;
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -276,7 +269,7 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
raw_spin_lock_irqsave(_lock, flags);
-   clear_irq_vector(virq + i, irq_data->chip_data);
+   clear_irq_vector(irq_data);
free_apic_chip_data(irq_data->chip_data);
irq_domain_reset_irq_data(irq_d

[Bugfix 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-11-30 Thread Jiang Liu
Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Signed-off-by: Jiang Liu 
---
 arch/x86/kernel/apic/vector.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59c8f25..d6ec36b4461e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+   cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain,
-  vector_cpumask);
-   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
 
BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
+   BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
 
return arch_early_ioapic_init();
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

2015-11-30 Thread Jiang Liu
Joe Lawrence <joe.lawre...@stratus.com> reported an use after release
issue related to x86 IRQ management code. Please refer to following
link for more information:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1026840.html

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   77 ++---
 1 file changed, 34 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 57934ef1d032..b63d6f84c0bb 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,9 +117,9 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
-   unsigned int dest = d->cfg.dest_apicid;
+   unsigned int dest;
 
-   if (d->move_in_progress)
+   if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;
 
/* Only try and allocate irqs on cpus that are present */
@@ -144,13 +144,12 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
   );
-   if (err)
-   break;
-   cpumask_andnot(d->old_domain, d->domain,
-  vector_cpumask);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_copy(d->domain, used_cpumask);
+   if (!err) {
+   cpumask_andnot(d->old_domain, d->domain,
+  vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
+   d->cfg.dest_apicid = dest;
+   }
break;
}
 
@@ -183,14 +182,12 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
-   if (d->cfg.vector) {
+   if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
-   d->move_in_progress =
-  cpumask_intersects(d->old_domain, cpu_online_mask);
-   }
+   d->cfg.vector = vector;
+   d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
-   d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,7 +195,8 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   d->cfg.dest_apicid = dest;
+   cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+   d->move_in_progress = !cpumask_empty(d->old_domain);
}
 
return err;
@@ -230,7 +228,7 @@ static int assign_irq_vector_policy(int irq, int node,
 
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
-   struct irq_desc *desc;
+   struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -239,10 +237,6 @@ static void clear_irq_vector(int irq, struct 
apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress))
-   return;
-
-   desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
 vector++) {
@@ -424,10 +418,13 @@ static void __setup_vector_irq(int cpu)
struct irq_data *idata = irq_desc_get_irq_data(desc);
 
data = apic_chip_data(idata);
-   if (!data || !cpumask_test_cpu(cpu, data->domain))
-   continue;
-   vector = data->cfg.vector;

[Bugfix 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

2015-11-30 Thread Jiang Liu
Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b4461e..f03957e7c50d 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+   unsigned int dest = d->cfg.dest_apicid;
 
if (d->move_in_progress)
return -EBUSY;
@@ -140,11 +141,16 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 * the current in use mask. So cleanup the vector
 * allocation for the members that are not used anymore.
 */
+   cpumask_and(used_cpumask, d->domain, vector_cpumask);
+   err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+  );
+   if (err)
+   break;
cpumask_andnot(d->old_domain, d->domain,
   vector_cpumask);
d->move_in_progress =
   cpumask_intersects(d->old_domain, cpu_online_mask);
-   cpumask_and(d->domain, d->domain, vector_cpumask);
+   cpumask_copy(d->domain, used_cpumask);
break;
}
 
@@ -167,11 +173,13 @@ next:
 
if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
}
+   if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, ))
+   goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +198,7 @@ next:
 
if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
-   err = apic->cpu_mask_to_apicid_and(mask, d->domain,
-  >cfg.dest_apicid);
+   d->cfg.dest_apicid = dest;
}
 
return err;
@@ -493,14 +500,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;
 
err = assign_irq_vector(irq, data, dest);
-   if (err) {
-   if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
-   pr_err("Failed to recover vector for irq %d\n", irq);
-   return err;
-   }
 
-   return IRQ_SET_MASK_OK;
+   return err ? err : IRQ_SET_MASK_OK;
 }
 
 static struct irq_chip lapic_controller = {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix 3/5] x86/irq: Fix a race window in x86_vector_free_irqs()

2015-11-30 Thread Jiang Liu
There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
x   //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->   // may access freed memory
raw_spin_unlock(>lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f03957e7c50d..57934ef1d032 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -231,23 +231,16 @@ static int assign_irq_vector_policy(int irq, int node,
 static void clear_irq_vector(int irq, struct apic_chip_data *data)
 {
struct irq_desc *desc;
-   unsigned long flags;
-   int cpu, vector;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   BUG_ON(!data->cfg.vector);
+   int cpu, vector = data->cfg.vector;
 
-   vector = data->cfg.vector;
+   BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);
 
-   if (likely(!data->move_in_progress)) {
-   raw_spin_unlock_irqrestore(_lock, flags);
+   if (likely(!data->move_in_progress))
return;
-   }
 
desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -260,7 +253,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data 
*data)
}
}
data->move_in_progress = 0;
-   raw_spin_unlock_irqrestore(_lock, flags);
+   cpumask_clear(data->old_domain);
 }
 
 void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -282,18 +275,21 @@ static void x86_vector_free_irqs(struct irq_domain 
*domain,
 unsigned int virq, unsigned int nr_irqs)
 {
struct irq_data *irq_data;
+   unsigned long flags;
int i;
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+   raw_spin_lock_irqsave(_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+   irq_domain_reset_irq_data(irq_data);
+   raw_spin_unlock_irqrestore(_lock, flags);
 #ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
 #endif
-   irq_domain_reset_irq_data(irq_data);
}
}
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix 5/5] x86/irq: Trivial cleanups for x86 vector allocation code

2015-11-30 Thread Jiang Liu
Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |   54 ++---
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b63d6f84c0bb..0183c44a13cb 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@
 
 struct apic_chip_data {
struct irq_cfg  cfg;
+   u8  move_in_progress : 1;
cpumask_var_t   domain;
cpumask_var_t   old_domain;
-   u8  move_in_progress : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
 
 void lock_vector_lock(void)
 {
-   /* Used to the online set of cpus does not change
+   /* Used to ensure that the online set of cpus does not change
 * during assign_irq_vector.
 */
raw_spin_lock(_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
 }
 
-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
-  const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 */
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
-   int cpu, err;
+   int cpu, err = -EBUSY;
+   struct irq_desc *desc = irq_data_to_desc(data);
+   struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+   unsigned long flags;
 
+   raw_spin_lock_irqsave(_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
-   return -EBUSY;
+   goto out;
 
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -187,7 +190,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
-   per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+   per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -198,37 +201,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
-   return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
-const struct cpumask *mask)
-{
-   int err;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(_lock, flags);
-   err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(_lock, flags);
+
return err;
 }
 
-static int assign_irq_vector_policy(int irq, int node,
-   struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
 {
if (info && info->mask)
-   return assign_irq_vector(irq, data, info->mask);
+   return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
-   assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+   assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
-   return assign_irq_vector(irq, data, apic->target_cpus());
+   return assign_irq_vector(data, apic->target_cpus());
 }
 
-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
 {
-   struct irq_desc *desc = irq_to_desc(irq);
+   struct irq_desc *desc = irq_data_to_desc(irq_data);
+   struct apic_chip_data *data = irq_data->chip_data;
int cpu, vector = data->cfg.vector;
 
BUG_ON(!vector);
@@ -276,7 +269,7 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
raw_spin_lock_irqsave(_lock, flags);
-   clear_irq_vector(virq + i, irq_data->chip_data);
+   clear_irq_vector(irq_data);
free_apic_chip_d

[Bugfix 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

2015-11-30 Thread Jiang Liu
Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
---
 arch/x86/kernel/apic/vector.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59c8f25..d6ec36b4461e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {
 
 struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
 static struct irq_chip lapic_controller;
 #ifdef CONFIG_X86_IO_APIC
 static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+   cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}
 
if (unlikely(current_vector == vector)) {
-   cpumask_or(d->old_domain, d->old_domain,
-  vector_cpumask);
-   cpumask_andnot(vector_cpumask, mask, d->old_domain);
+   cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+   cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);
 
BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
+   BUG_ON(!alloc_cpumask_var(_cpumask, GFP_KERNEL));
 
return arch_early_ioapic_init();
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-29 Thread Jiang Liu
On 2015/11/28 0:06, Rafael J. Wysocki wrote:
> On Friday, November 27, 2015 11:12:33 AM Jiang Liu wrote:
>> From: Liu Jiang 
>>
>> Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
>> PCI host bridge") converted x86 to use the common interface
>> acpi_pci_root_create, but the conversion missed on code piece in
>> arch/x86/pci/bus_numa.c, which causes regression on some legacy
>> AMD platforms as reported by Arthur Marsh .
>> The root causes is that acpi_pci_root_create() fails to insert
>> host bridge resources into iomem_resource/ioport_resource because
>> x86_pci_root_bus_resources() has already inserted those resources.
>> So change x86_pci_root_bus_resources() to not insert resources into
>> iomem_resource/ioport_resource.
>>
>> Fixes: 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support PCI host 
>> bridge")
>> Signed-off-by: Jiang Liu 
>> Reported-and-tested-by: Arthur Marsh 
>> Cc: Keith Busch 
>> Cc: Arthur Marsh 
>> Cc: Hans de Bruin 
> 
> What exactly has changed between this version and the previous one?
Hi Rafael,
I have removed following changes against the original patch
posted at Nov 16.
   bus);

/* already added by acpi ? */
-   resource_list_for_each_entry(window, resources)
+   resource_list_for_each_entry(window, >resources)
if (window->res->flags & IORESOURCE_BUS) {
found = true;
break;
}
-
if (!found)
pci_add_resource(resources, >busn);

And I only refined the commit message based on the test patch
I sent to Authur as an attachment at Nov 25.
Thanks,
Gerry
> 
> 
>> ---
>>  arch/x86/pci/bus_numa.c |   13 ++---
>>  drivers/acpi/pci_root.c |7 +++
>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
>> index 7bcf06a7cd12..6eb3c8af96e2 100644
>> --- a/arch/x86/pci/bus_numa.c
>> +++ b/arch/x86/pci/bus_numa.c
>> @@ -50,18 +50,9 @@ void x86_pci_root_bus_resources(int bus, struct list_head 
>> *resources)
>>  if (!found)
>>  pci_add_resource(resources, >busn);
>>  
>> -list_for_each_entry(root_res, >resources, list) {
>> -struct resource *res;
>> -struct resource *root;
>> +list_for_each_entry(root_res, >resources, list)
>> +pci_add_resource(resources, _res->res);
>>  
>> -res = _res->res;
>> -pci_add_resource(resources, res);
>> -if (res->flags & IORESOURCE_IO)
>> -root = _resource;
>> -else
>> -root = _resource;
>> -insert_resource(root, res);
>> -}
>>  return;
>>  
>>  default_resources:
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 850d7bf0c873..ae3fe4e64203 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -768,6 +768,13 @@ static void pci_acpi_root_add_resources(struct 
>> acpi_pci_root_info *info)
>>  else
>>  continue;
>>  
>> +/*
>> + * Some legacy x86 host bridge drivers use iomem_resource and
>> + * ioport_resource as default resource pool, skip it.
>> + */
>> +if (res == root)
>> +continue;
>> +
>>  conflict = insert_resource_conflict(root, res);
>>  if (conflict) {
>>  dev_info(>bridge->dev,
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-29 Thread Jiang Liu
On 2015/11/28 0:06, Rafael J. Wysocki wrote:
> On Friday, November 27, 2015 11:12:33 AM Jiang Liu wrote:
>> From: Liu Jiang <jiang@linux.intel.com>
>>
>> Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
>> PCI host bridge") converted x86 to use the common interface
>> acpi_pci_root_create, but the conversion missed on code piece in
>> arch/x86/pci/bus_numa.c, which causes regression on some legacy
>> AMD platforms as reported by Arthur Marsh <arthur.ma...@internode.on.net>.
>> The root causes is that acpi_pci_root_create() fails to insert
>> host bridge resources into iomem_resource/ioport_resource because
>> x86_pci_root_bus_resources() has already inserted those resources.
>> So change x86_pci_root_bus_resources() to not insert resources into
>> iomem_resource/ioport_resource.
>>
>> Fixes: 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support PCI host 
>> bridge")
>> Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>> Reported-and-tested-by: Arthur Marsh <arthur.ma...@internode.on.net>
>> Cc: Keith Busch <keith.bu...@intel.com>
>> Cc: Arthur Marsh <arthur.ma...@internode.on.net>
>> Cc: Hans de Bruin <jmdebr...@xmsnet.nl>
> 
> What exactly has changed between this version and the previous one?
Hi Rafael,
I have removed following changes against the original patch
posted at Nov 16.
   bus);

/* already added by acpi ? */
-   resource_list_for_each_entry(window, resources)
+   resource_list_for_each_entry(window, >resources)
if (window->res->flags & IORESOURCE_BUS) {
found = true;
break;
}
-
if (!found)
pci_add_resource(resources, >busn);

And I only refined the commit message based on the test patch
I sent to Authur as an attachment at Nov 25.
Thanks,
Gerry
> 
> 
>> ---
>>  arch/x86/pci/bus_numa.c |   13 ++---
>>  drivers/acpi/pci_root.c |7 +++
>>  2 files changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
>> index 7bcf06a7cd12..6eb3c8af96e2 100644
>> --- a/arch/x86/pci/bus_numa.c
>> +++ b/arch/x86/pci/bus_numa.c
>> @@ -50,18 +50,9 @@ void x86_pci_root_bus_resources(int bus, struct list_head 
>> *resources)
>>  if (!found)
>>  pci_add_resource(resources, >busn);
>>  
>> -list_for_each_entry(root_res, >resources, list) {
>> -struct resource *res;
>> -struct resource *root;
>> +list_for_each_entry(root_res, >resources, list)
>> +pci_add_resource(resources, _res->res);
>>  
>> -res = _res->res;
>> -pci_add_resource(resources, res);
>> -if (res->flags & IORESOURCE_IO)
>> -root = _resource;
>> -else
>> -root = _resource;
>> -insert_resource(root, res);
>> -}
>>  return;
>>  
>>  default_resources:
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 850d7bf0c873..ae3fe4e64203 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -768,6 +768,13 @@ static void pci_acpi_root_add_resources(struct 
>> acpi_pci_root_info *info)
>>  else
>>  continue;
>>  
>> +/*
>> + * Some legacy x86 host bridge drivers use iomem_resource and
>> + * ioport_resource as default resource pool, skip it.
>> + */
>> +if (res == root)
>> +continue;
>> +
>>  conflict = insert_resource_conflict(root, res);
>>  if (conflict) {
>>  dev_info(>bridge->dev,
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: irq_desc use-after-free in smp_irq_move_cleanup_interrupt

2015-11-27 Thread Jiang Liu


On 2015/11/26 5:12, Thomas Gleixner wrote:
> On Wed, 25 Nov 2015, Thomas Gleixner wrote:
>> So if CPU1 gets the IPI _BEFORE_ move_in_progress is set to 0, and
>> does not get another IPI before the next move . That has been that
>> way forever.
>>
>> Duh. Working on a real fix this time.
> 
> Here you go. Completely untested of course.
> 
> Larger than I hoped for, but the simple fix of just clearing the
> move_in_progress flag before sending the IPI does not work because:
> 
> CPU0CPU1  CPU2
> data->move_in_progress=0
> sendIPI() 
> set_affinity()
> lock_vector() handle_IPI
> move_in_progress = 1  lock_vector()
> unlock_vector()
>   move_in_progress == 1
>   -> no cleanup
> 
> So we are back to square one. Now one might think that taking vector
> lock prevents that issue:
> 
> CPU0CPU1  CPU2
> lock_vector()
> data->move_in_progress=0
> sendIPI() 
> unlock_vector()
> set_affinity()
> assign_irq_vector()
> lock_vector() handle_IPI
> move_in_progress = 1  lock_vector()
> unlock_vector()
>   move_in_progress == 1
> Not really. 
> 
> So now the solution is:
> 
> CPU0CPU1  CPU2
> lock_vector()
> data->move_in_progress=0
> data->cleanup_mask = data->old_domain
> sendIPI() 
> unlock_vector()
> set_affinity()
> assign_irq_vector()
> lock_vector() 
> if (move_in_progress ||
> !empty(cleanup_mask)) {
>unlock_vector()
>return -EBUSY; handle_IPI
> } lock_vector()
>   move_in_progress == 0
>   cpu is set in cleanup 
> mask
>   ->cleanup vector
> 
> Looks a bit overkill with the extra cpumask. I tried a simple counter
> but that does not work versus cpu unplug as we do not know whether the
> outgoing cpu is involved in the cleanup or not. And if the cpu is
> involved we starve assign_irq_vector() 
> 
> The upside of this is that we get rid of that atomic allocation in
> __send_cleanup_vector().
Hi Thomas,
Maybe more headache for you now:)
It seems there are still rooms for improvements. First it
seems we could just reuse old_domain instead of adding cleanup_mask.
Second I found another race window among x86_vector_free_irqs(),
__send_cleanup_vector() and smp_irq_move_cleanup_interrupt().
I'm trying to refine your patch based following rules:
1) move_in_progress controls whether we need to send IPIs
2) old_domain controls which CPUs we should do clean up
3) assign_irq_vector checks both move_in_progress and old_domain.
Will send out the patch soon for comments:)
Thanks,
Gerry   

> 
> Brain hurts by now. 
> 
> Not-Yet-Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/kernel/apic/vector.c |   37 -
>  1 file changed, 16 insertions(+), 21 deletions(-)
> 
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -25,6 +25,7 @@ struct apic_chip_data {
>   struct irq_cfg  cfg;
>   cpumask_var_t   domain;
>   cpumask_var_t   old_domain;
> + cpumask_var_t   cleanup_mask;
>   u8  move_in_progress : 1;
>  };
>  
> @@ -83,7 +84,11 @@ static struct apic_chip_data *alloc_apic
>   goto out_data;
>   if (!zalloc_cpumask_var_node(>old_domain, GFP_KERNEL, node))
>   goto out_domain;
> + if (!zalloc_cpumask_var_node(>cleanup_mask, GFP_KERNEL, node))
> + goto out_old;
>   return data;
> +out_old:
> + free_cpumask_var(data->old_domain);
>  out_domain:
>   free_cpumask_var(data->domain);
>  out_data:
> @@ -96,6 +101,7 @@ static void free_apic_chip_data(struct a
>   if (data) {
>   free_cpumask_var(data->domain);
>   free_cpumask_var(data->old_domain);
> + free_cpumask_var(data->cleanup_mask);
>   kfree(data);
>   }
>  }
> @@ -118,7 +124,7 @@ static int __assign_irq_vector(int irq,
>   static int current_offset = VECTOR_OFFSET_START % 16;
>   int cpu, err;
>  
> - if (d->move_in_progress)
> + if 

Re: irq_desc use-after-free in smp_irq_move_cleanup_interrupt

2015-11-27 Thread Jiang Liu


On 2015/11/26 5:12, Thomas Gleixner wrote:
> On Wed, 25 Nov 2015, Thomas Gleixner wrote:
>> So if CPU1 gets the IPI _BEFORE_ move_in_progress is set to 0, and
>> does not get another IPI before the next move . That has been that
>> way forever.
>>
>> Duh. Working on a real fix this time.
> 
> Here you go. Completely untested of course.
> 
> Larger than I hoped for, but the simple fix of just clearing the
> move_in_progress flag before sending the IPI does not work because:
> 
> CPU0CPU1  CPU2
> data->move_in_progress=0
> sendIPI() 
> set_affinity()
> lock_vector() handle_IPI
> move_in_progress = 1  lock_vector()
> unlock_vector()
>   move_in_progress == 1
>   -> no cleanup
> 
> So we are back to square one. Now one might think that taking vector
> lock prevents that issue:
> 
> CPU0CPU1  CPU2
> lock_vector()
> data->move_in_progress=0
> sendIPI() 
> unlock_vector()
> set_affinity()
> assign_irq_vector()
> lock_vector() handle_IPI
> move_in_progress = 1  lock_vector()
> unlock_vector()
>   move_in_progress == 1
> Not really. 
> 
> So now the solution is:
> 
> CPU0CPU1  CPU2
> lock_vector()
> data->move_in_progress=0
> data->cleanup_mask = data->old_domain
> sendIPI() 
> unlock_vector()
> set_affinity()
> assign_irq_vector()
> lock_vector() 
> if (move_in_progress ||
> !empty(cleanup_mask)) {
>unlock_vector()
>return -EBUSY; handle_IPI
> } lock_vector()
>   move_in_progress == 0
>   cpu is set in cleanup 
> mask
>   ->cleanup vector
> 
> Looks a bit overkill with the extra cpumask. I tried a simple counter
> but that does not work versus cpu unplug as we do not know whether the
> outgoing cpu is involved in the cleanup or not. And if the cpu is
> involved we starve assign_irq_vector() 
> 
> The upside of this is that we get rid of that atomic allocation in
> __send_cleanup_vector().
Hi Thomas,
Maybe more headache for you now:)
It seems there are still rooms for improvements. First it
seems we could just reuse old_domain instead of adding cleanup_mask.
Second I found another race window among x86_vector_free_irqs(),
__send_cleanup_vector() and smp_irq_move_cleanup_interrupt().
I'm trying to refine your patch based following rules:
1) move_in_progress controls whether we need to send IPIs
2) old_domain controls which CPUs we should do clean up
3) assign_irq_vector checks both move_in_progress and old_domain.
Will send out the patch soon for comments:)
Thanks,
Gerry   

> 
> Brain hurts by now. 
> 
> Not-Yet-Signed-off-by: Thomas Gleixner 
> ---
>  arch/x86/kernel/apic/vector.c |   37 -
>  1 file changed, 16 insertions(+), 21 deletions(-)
> 
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -25,6 +25,7 @@ struct apic_chip_data {
>   struct irq_cfg  cfg;
>   cpumask_var_t   domain;
>   cpumask_var_t   old_domain;
> + cpumask_var_t   cleanup_mask;
>   u8  move_in_progress : 1;
>  };
>  
> @@ -83,7 +84,11 @@ static struct apic_chip_data *alloc_apic
>   goto out_data;
>   if (!zalloc_cpumask_var_node(>old_domain, GFP_KERNEL, node))
>   goto out_domain;
> + if (!zalloc_cpumask_var_node(>cleanup_mask, GFP_KERNEL, node))
> + goto out_old;
>   return data;
> +out_old:
> + free_cpumask_var(data->old_domain);
>  out_domain:
>   free_cpumask_var(data->domain);
>  out_data:
> @@ -96,6 +101,7 @@ static void free_apic_chip_data(struct a
>   if (data) {
>   free_cpumask_var(data->domain);
>   free_cpumask_var(data->old_domain);
> + free_cpumask_var(data->cleanup_mask);
>   kfree(data);
>   }
>  }
> @@ -118,7 +124,7 @@ static int __assign_irq_vector(int irq,
>   static int current_offset = VECTOR_OFFSET_START % 16;
>   int cpu, err;
>  
> - if 

[PATCH] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-26 Thread Jiang Liu
From: Liu Jiang 

Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
PCI host bridge") converted x86 to use the common interface
acpi_pci_root_create, but the conversion missed on code piece in
arch/x86/pci/bus_numa.c, which causes regression on some legacy
AMD platforms as reported by Arthur Marsh .
The root causes is that acpi_pci_root_create() fails to insert
host bridge resources into iomem_resource/ioport_resource because
x86_pci_root_bus_resources() has already inserted those resources.
So change x86_pci_root_bus_resources() to not insert resources into
iomem_resource/ioport_resource.

Fixes: 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support PCI host 
bridge")
Signed-off-by: Jiang Liu 
Reported-and-tested-by: Arthur Marsh 
Cc: Keith Busch 
Cc: Arthur Marsh 
Cc: Hans de Bruin 
---
 arch/x86/pci/bus_numa.c |   13 ++---
 drivers/acpi/pci_root.c |7 +++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 7bcf06a7cd12..6eb3c8af96e2 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -50,18 +50,9 @@ void x86_pci_root_bus_resources(int bus, struct list_head 
*resources)
if (!found)
pci_add_resource(resources, >busn);
 
-   list_for_each_entry(root_res, >resources, list) {
-   struct resource *res;
-   struct resource *root;
+   list_for_each_entry(root_res, >resources, list)
+   pci_add_resource(resources, _res->res);
 
-   res = _res->res;
-   pci_add_resource(resources, res);
-   if (res->flags & IORESOURCE_IO)
-   root = _resource;
-   else
-   root = _resource;
-   insert_resource(root, res);
-   }
return;
 
 default_resources:
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 850d7bf0c873..ae3fe4e64203 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -768,6 +768,13 @@ static void pci_acpi_root_add_resources(struct 
acpi_pci_root_info *info)
else
continue;
 
+   /*
+* Some legacy x86 host bridge drivers use iomem_resource and
+* ioport_resource as default resource pool, skip it.
+*/
+   if (res == root)
+   continue;
+
conflict = insert_resource_conflict(root, res);
if (conflict) {
dev_info(>bridge->dev,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-26 Thread Jiang Liu
From: Liu Jiang <jiang@linux.intel.com>

Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
PCI host bridge") converted x86 to use the common interface
acpi_pci_root_create, but the conversion missed on code piece in
arch/x86/pci/bus_numa.c, which causes regression on some legacy
AMD platforms as reported by Arthur Marsh <arthur.ma...@internode.on.net>.
The root causes is that acpi_pci_root_create() fails to insert
host bridge resources into iomem_resource/ioport_resource because
x86_pci_root_bus_resources() has already inserted those resources.
So change x86_pci_root_bus_resources() to not insert resources into
iomem_resource/ioport_resource.

Fixes: 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support PCI host 
bridge")
Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Reported-and-tested-by: Arthur Marsh <arthur.ma...@internode.on.net>
Cc: Keith Busch <keith.bu...@intel.com>
Cc: Arthur Marsh <arthur.ma...@internode.on.net>
Cc: Hans de Bruin <jmdebr...@xmsnet.nl>
---
 arch/x86/pci/bus_numa.c |   13 ++---
 drivers/acpi/pci_root.c |7 +++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 7bcf06a7cd12..6eb3c8af96e2 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -50,18 +50,9 @@ void x86_pci_root_bus_resources(int bus, struct list_head 
*resources)
if (!found)
pci_add_resource(resources, >busn);
 
-   list_for_each_entry(root_res, >resources, list) {
-   struct resource *res;
-   struct resource *root;
+   list_for_each_entry(root_res, >resources, list)
+   pci_add_resource(resources, _res->res);
 
-   res = _res->res;
-   pci_add_resource(resources, res);
-   if (res->flags & IORESOURCE_IO)
-   root = _resource;
-   else
-   root = _resource;
-   insert_resource(root, res);
-   }
return;
 
 default_resources:
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 850d7bf0c873..ae3fe4e64203 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -768,6 +768,13 @@ static void pci_acpi_root_add_resources(struct 
acpi_pci_root_info *info)
else
continue;
 
+   /*
+* Some legacy x86 host bridge drivers use iomem_resource and
+* ioport_resource as default resource pool, skip it.
+*/
+   if (res == root)
+   continue;
+
conflict = insert_resource_conflict(root, res);
if (conflict) {
dev_info(>bridge->dev,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-25 Thread Jiang Liu
On 2015/11/25 8:32, Arthur Marsh wrote:
> Keith Busch wrote on 25/11/15 09:34:
>> On Tue, Nov 24, 2015 at 11:19:34PM +0100, Rafael J. Wysocki wrote:
>>> Quite frankly, I'm more likely to revert the offending commit at this
>>> point as that's not the only regression reported against it and the
>>> fix only helps in one case (out of three known to me).
>>
>> Using 4.4-rc1 and can confirm the patch fixes my regression report. The
>> revert also fixes it, so either way is good for me!
>>
> 
> To re-cap, all was fine for me until:
> 
> 4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
> commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> Author: Jiang Liu 
> Date:   Wed Oct 14 14:29:41 2015 +0800
> 
>  x86/PCI/ACPI: Use common interface to support PCI host bridge
> 
>  Use common interface to simplify ACPI PCI host bridge implementation.
> 
>  Signed-off-by: Jiang Liu 
>  Reviewed-by: Hanjun Guo 
>  Acked-by: Bjorn Helgaas 
>  Signed-off-by: Rafael J. Wysocki 
> 
> :04 04 a3447eea376b5a3e6f57deb35cf064c5481b45e3
> f64d8e49fd87b776933dfa3dfefcb33509004d3f M  arch
> 
> From the boot-up I get the message as shown in the images at:
> http://www.users.on.net/~arthur.marsh/20151107601.jpg and
> http://www.users.on.net/~arthur.marsh/20151107602.jpg
> 
> The boot-up suggests trying rebooting with pci=alloc but that didn't help.
> 
> The errors shown include
> "BAR 0: trying firmware assignment [io  size 0x0020]"
> "BAR 0: [io  size 0x0020] conflicts with PCI Bus #00 [io  0x-0x]
> "BAR 0: failed to assign [io  size 0x0020]
> 
> Applying the following patch on top of the patch above from 14 October
> 2015 worked for me:
> 
> 
> From 02818ba34bfa76d93f2a29c85660da0323b0b457 Mon Sep 17 00:00:00 2001
> From: Liu Jiang 
> Date: Mon, 9 Nov 2015 13:36:48 +0800
> Subject: [PATCH]
> 
> 
> Signed-off-by: Liu Jiang 
> ---
>  arch/x86/pci/bus_numa.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
> index 7bcf06a7cd12..022d83158cdb 100644
> --- a/arch/x86/pci/bus_numa.c
> +++ b/arch/x86/pci/bus_numa.c
> @@ -51,6 +51,8 @@ void x86_pci_root_bus_resources(int bus, struct
> list_head *resources)
>  pci_add_resource(resources, >busn);
> 
>  list_for_each_entry(root_res, >resources, list) {
> +pci_add_resource(resources, _res->res);
> +#if 0
>  struct resource *res;
>  struct resource *root;
> 
> @@ -61,6 +63,7 @@ void x86_pci_root_bus_resources(int bus, struct
> list_head *resources)
>  else
>  root = _resource;
>  insert_resource(root, res);
> +#endif
>  }
>  return;
> 
> ###
> 
> The patch postd by Jian Liu on 16 November 2015 "[Bugfix] x86/PCI: Fix
> regression caused by commit 4d6b4e69a245" had *not* been seen or tested
> by me before being posted to the linux-acpi list and when I did test it
> (after removing the patch above from 9 November 2015), things broke:
> http://www.users.on.net/~arthur.marsh/20151116611.jpg
> 
> So if "commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> x86/PCI/ACPI: Use common interface to support PCI host bridge" stays,
> then the patch "16 November 2015 [Bugfix] x86/PCI: Fix regression caused
> by commit 4d6b4e69a245" would need to go and the patch above from 9
> November 2015 would need to be accepted into the mainline for my machine
> to boot from the mainline code.
Hi Arthur,
Thanks for reminder again!
It's a little strange, the formal patch "[Bugfix] x86/PCI: Fix
regression caused by commit 4d6b4e69a245" is based on the debug patch
I sent to you at 9 November 2015.
Could you please help to try the attached patch again?
Thanks,
Gerry

> 
> Arthur.
>From 2f82bcfb3f8804197512e55259b57e6fbed6a913 Mon Sep 17 00:00:00 2001
From: Liu Jiang 
Date: Mon, 9 Nov 2015 13:36:48 +0800
Subject: [PATCH] x86/PCI: Fix regression caused by commit 4d6b4e69a245

Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
PCI host bridge") converted x86 to use the common interface
acpi_pci_root_create, but the conversion missed on code piece in
arch/x86/pci/bus_numa.c, which causes regression on some legacy
AMD platforms as reported by Arthur Marsh .
The root causes is that acpi_pci_root_create() fails to insert
host bridge resources into iomem_resource/ioport_resource because
x86_pci_root_bus_resources() has already inserted those resources.
So change x86_pci_root_bus_resources() to not insert resources into
iomem_resource/ioport_resource.

Signed-off-by: Jiang Liu 
Reported-and-tested-by: Arthur Marsh 
Cc: Keith

Re: [Bugfix] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-25 Thread Jiang Liu
On 2015/11/25 6:19, Rafael J. Wysocki wrote:
> On Tue, Nov 24, 2015 at 5:49 PM, Bjorn Helgaas  wrote:
>> On Mon, Nov 16, 2015 at 12:27:37PM +0800, Jiang Liu wrote:
>>> From: Liu Jiang 
>>>
>>> Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
>>> PCI host bridge") converted x86 to use the common interface
>>> acpi_pci_root_create, but the conversion missed on code piece in
>>> arch/x86/pci/bus_numa.c, which causes regression on some legacy
>>> AMD platforms as reported by Arthur Marsh .
>>> The root causes is that acpi_pci_root_create() fails to insert
>>> host bridge resources into iomem_resource/ioport_resource because
>>> x86_pci_root_bus_resources() has already inserted those resources.
>>> So change x86_pci_root_bus_resources() to not insert resources into
>>> iomem_resource/ioport_resource.
>>
>> Fixes: 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support PCI host 
>> bridge")
>>
>>> Signed-off-by: Jiang Liu 
>>> Reported-and-tested-by: Arthur Marsh 
>>
>> What's the status of this?  It looks like a regression we need to fix
>> for v4.4.
>>
>> AFAICT, Arthur did *not* test this patch (rather, his response says he
>> did test it and the test failed).
>>
>> 4d6b4e69a245 was merged by Rafael, and I assume he'll merge the fix
>> unless I hear otherwise.
> 
> Quite frankly, I'm more likely to revert the offending commit at this
> point as that's not the only regression reported against it and the
> fix only helps in one case (out of three known to me).
Hi Rafael,
I got regression report from Hans de Bruin,
Keith Busch , and Arthur Marsh
. Hans and Keith also reports
the patch fixes the regression. For Arthur's case, the debug
patch works for him, but the formal patch based on the debug
patch fails, so I need to do more investigation about this.
Is there any other report related to commit 4d6b4e69a245 so
I could help to investigate?
Thanks,
Gerry

> 
> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-25 Thread Jiang Liu
On 2015/11/25 8:32, Arthur Marsh wrote:
> Keith Busch wrote on 25/11/15 09:34:
>> On Tue, Nov 24, 2015 at 11:19:34PM +0100, Rafael J. Wysocki wrote:
>>> Quite frankly, I'm more likely to revert the offending commit at this
>>> point as that's not the only regression reported against it and the
>>> fix only helps in one case (out of three known to me).
>>
>> Using 4.4-rc1 and can confirm the patch fixes my regression report. The
>> revert also fixes it, so either way is good for me!
>>
> 
> To re-cap, all was fine for me until:
> 
> 4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
> commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> Author: Jiang Liu <jiang@linux.intel.com>
> Date:   Wed Oct 14 14:29:41 2015 +0800
> 
>  x86/PCI/ACPI: Use common interface to support PCI host bridge
> 
>  Use common interface to simplify ACPI PCI host bridge implementation.
> 
>  Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>  Reviewed-by: Hanjun Guo <hanjun@linaro.org>
>  Acked-by: Bjorn Helgaas <bhelg...@google.com>
>  Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> 
> :04 04 a3447eea376b5a3e6f57deb35cf064c5481b45e3
> f64d8e49fd87b776933dfa3dfefcb33509004d3f M  arch
> 
> From the boot-up I get the message as shown in the images at:
> http://www.users.on.net/~arthur.marsh/20151107601.jpg and
> http://www.users.on.net/~arthur.marsh/20151107602.jpg
> 
> The boot-up suggests trying rebooting with pci=alloc but that didn't help.
> 
> The errors shown include
> "BAR 0: trying firmware assignment [io  size 0x0020]"
> "BAR 0: [io  size 0x0020] conflicts with PCI Bus #00 [io  0x-0x]
> "BAR 0: failed to assign [io  size 0x0020]
> 
> Applying the following patch on top of the patch above from 14 October
> 2015 worked for me:
> 
> 
> From 02818ba34bfa76d93f2a29c85660da0323b0b457 Mon Sep 17 00:00:00 2001
> From: Liu Jiang <jiang@linux.intel.com>
> Date: Mon, 9 Nov 2015 13:36:48 +0800
> Subject: [PATCH]
> 
> 
> Signed-off-by: Liu Jiang <jiang@linux.intel.com>
> ---
>  arch/x86/pci/bus_numa.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
> index 7bcf06a7cd12..022d83158cdb 100644
> --- a/arch/x86/pci/bus_numa.c
> +++ b/arch/x86/pci/bus_numa.c
> @@ -51,6 +51,8 @@ void x86_pci_root_bus_resources(int bus, struct
> list_head *resources)
>  pci_add_resource(resources, >busn);
> 
>  list_for_each_entry(root_res, >resources, list) {
> +pci_add_resource(resources, _res->res);
> +#if 0
>  struct resource *res;
>  struct resource *root;
> 
> @@ -61,6 +63,7 @@ void x86_pci_root_bus_resources(int bus, struct
> list_head *resources)
>  else
>  root = _resource;
>  insert_resource(root, res);
> +#endif
>  }
>  return;
> 
> ###
> 
> The patch postd by Jian Liu on 16 November 2015 "[Bugfix] x86/PCI: Fix
> regression caused by commit 4d6b4e69a245" had *not* been seen or tested
> by me before being posted to the linux-acpi list and when I did test it
> (after removing the patch above from 9 November 2015), things broke:
> http://www.users.on.net/~arthur.marsh/20151116611.jpg
> 
> So if "commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> x86/PCI/ACPI: Use common interface to support PCI host bridge" stays,
> then the patch "16 November 2015 [Bugfix] x86/PCI: Fix regression caused
> by commit 4d6b4e69a245" would need to go and the patch above from 9
> November 2015 would need to be accepted into the mainline for my machine
> to boot from the mainline code.
Hi Arthur,
Thanks for reminder again!
It's a little strange, the formal patch "[Bugfix] x86/PCI: Fix
regression caused by commit 4d6b4e69a245" is based on the debug patch
I sent to you at 9 November 2015.
Could you please help to try the attached patch again?
Thanks,
Gerry

> 
> Arthur.
>From 2f82bcfb3f8804197512e55259b57e6fbed6a913 Mon Sep 17 00:00:00 2001
From: Liu Jiang <jiang@linux.intel.com>
Date: Mon, 9 Nov 2015 13:36:48 +0800
Subject: [PATCH] x86/PCI: Fix regression caused by commit 4d6b4e69a245

Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
PCI host bridge") converted x86 to use the common interface
acpi_pci_root_create, but the conversion missed on code piece in
arch/x86/pci/bus_numa.c, which causes regression on some legacy
AMD platforms as reported by Arthur Marsh <arthur.ma...@internode.on.net>.
The root causes is that acpi_pci_root_create() fails to insert
host bridge

Re: [Bugfix] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-25 Thread Jiang Liu
On 2015/11/25 6:19, Rafael J. Wysocki wrote:
> On Tue, Nov 24, 2015 at 5:49 PM, Bjorn Helgaas <helg...@kernel.org> wrote:
>> On Mon, Nov 16, 2015 at 12:27:37PM +0800, Jiang Liu wrote:
>>> From: Liu Jiang <jiang@linux.intel.com>
>>>
>>> Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
>>> PCI host bridge") converted x86 to use the common interface
>>> acpi_pci_root_create, but the conversion missed on code piece in
>>> arch/x86/pci/bus_numa.c, which causes regression on some legacy
>>> AMD platforms as reported by Arthur Marsh <arthur.ma...@internode.on.net>.
>>> The root causes is that acpi_pci_root_create() fails to insert
>>> host bridge resources into iomem_resource/ioport_resource because
>>> x86_pci_root_bus_resources() has already inserted those resources.
>>> So change x86_pci_root_bus_resources() to not insert resources into
>>> iomem_resource/ioport_resource.
>>
>> Fixes: 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support PCI host 
>> bridge")
>>
>>> Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>>> Reported-and-tested-by: Arthur Marsh <arthur.ma...@internode.on.net>
>>
>> What's the status of this?  It looks like a regression we need to fix
>> for v4.4.
>>
>> AFAICT, Arthur did *not* test this patch (rather, his response says he
>> did test it and the test failed).
>>
>> 4d6b4e69a245 was merged by Rafael, and I assume he'll merge the fix
>> unless I hear otherwise.
> 
> Quite frankly, I'm more likely to revert the offending commit at this
> point as that's not the only regression reported against it and the
> fix only helps in one case (out of three known to me).
Hi Rafael,
I got regression report from Hans de Bruin<jmdebr...@xmsnet.nl>,
Keith Busch <keith.bu...@intel.com>, and Arthur Marsh
<arthur.ma...@internode.on.net>. Hans and Keith also reports
the patch fixes the regression. For Arthur's case, the debug
patch works for him, but the formal patch based on the debug
patch fails, so I need to do more investigation about this.
Is there any other report related to commit 4d6b4e69a245 so
I could help to investigate?
Thanks,
Gerry

> 
> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-15 Thread Jiang Liu
From: Liu Jiang 

Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
PCI host bridge") converted x86 to use the common interface
acpi_pci_root_create, but the conversion missed on code piece in
arch/x86/pci/bus_numa.c, which causes regression on some legacy
AMD platforms as reported by Arthur Marsh .
The root causes is that acpi_pci_root_create() fails to insert
host bridge resources into iomem_resource/ioport_resource because
x86_pci_root_bus_resources() has already inserted those resources.
So change x86_pci_root_bus_resources() to not insert resources into
iomem_resource/ioport_resource.

Signed-off-by: Jiang Liu 
Reported-and-tested-by: Arthur Marsh 
Cc: Keith Busch 
Cc: Arthur Marsh 
---
 arch/x86/pci/bus_numa.c |   16 +++-
 drivers/acpi/pci_root.c |7 +++
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 7bcf06a7cd12..ce53b5b64f51 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -41,27 +41,17 @@ void x86_pci_root_bus_resources(int bus, struct list_head 
*resources)
   bus);
 
/* already added by acpi ? */
-   resource_list_for_each_entry(window, resources)
+   resource_list_for_each_entry(window, >resources)
if (window->res->flags & IORESOURCE_BUS) {
found = true;
break;
}
-
if (!found)
pci_add_resource(resources, >busn);
 
-   list_for_each_entry(root_res, >resources, list) {
-   struct resource *res;
-   struct resource *root;
+   list_for_each_entry(root_res, >resources, list)
+   pci_add_resource(resources, _res->res);
 
-   res = _res->res;
-   pci_add_resource(resources, res);
-   if (res->flags & IORESOURCE_IO)
-   root = _resource;
-   else
-   root = _resource;
-   insert_resource(root, res);
-   }
return;
 
 default_resources:
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 850d7bf0c873..ae3fe4e64203 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -768,6 +768,13 @@ static void pci_acpi_root_add_resources(struct 
acpi_pci_root_info *info)
else
continue;
 
+   /*
+* Some legacy x86 host bridge drivers use iomem_resource and
+* ioport_resource as default resource pool, skip it.
+*/
+   if (res == root)
+   continue;
+
conflict = insert_resource_conflict(root, res);
if (conflict) {
dev_info(>bridge->dev,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Bugfix] x86/PCI: Fix regression caused by commit 4d6b4e69a245

2015-11-15 Thread Jiang Liu
From: Liu Jiang <jiang@linux.intel.com>

Commit 4d6b4e69a245 ("x86/PCI/ACPI: Use common interface to support
PCI host bridge") converted x86 to use the common interface
acpi_pci_root_create, but the conversion missed on code piece in
arch/x86/pci/bus_numa.c, which causes regression on some legacy
AMD platforms as reported by Arthur Marsh <arthur.ma...@internode.on.net>.
The root causes is that acpi_pci_root_create() fails to insert
host bridge resources into iomem_resource/ioport_resource because
x86_pci_root_bus_resources() has already inserted those resources.
So change x86_pci_root_bus_resources() to not insert resources into
iomem_resource/ioport_resource.

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Reported-and-tested-by: Arthur Marsh <arthur.ma...@internode.on.net>
Cc: Keith Busch <keith.bu...@intel.com>
Cc: Arthur Marsh <arthur.ma...@internode.on.net>
---
 arch/x86/pci/bus_numa.c |   16 +++-
 drivers/acpi/pci_root.c |7 +++
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 7bcf06a7cd12..ce53b5b64f51 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -41,27 +41,17 @@ void x86_pci_root_bus_resources(int bus, struct list_head 
*resources)
   bus);
 
/* already added by acpi ? */
-   resource_list_for_each_entry(window, resources)
+   resource_list_for_each_entry(window, >resources)
if (window->res->flags & IORESOURCE_BUS) {
found = true;
break;
}
-
if (!found)
pci_add_resource(resources, >busn);
 
-   list_for_each_entry(root_res, >resources, list) {
-   struct resource *res;
-   struct resource *root;
+   list_for_each_entry(root_res, >resources, list)
+   pci_add_resource(resources, _res->res);
 
-   res = _res->res;
-   pci_add_resource(resources, res);
-   if (res->flags & IORESOURCE_IO)
-   root = _resource;
-   else
-   root = _resource;
-   insert_resource(root, res);
-   }
return;
 
 default_resources:
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 850d7bf0c873..ae3fe4e64203 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -768,6 +768,13 @@ static void pci_acpi_root_add_resources(struct 
acpi_pci_root_info *info)
else
continue;
 
+   /*
+* Some legacy x86 host bridge drivers use iomem_resource and
+* ioport_resource as default resource pool, skip it.
+*/
+   if (res == root)
+   continue;
+
conflict = insert_resource_conflict(root, res);
if (conflict) {
dev_info(>bridge->dev,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-13 Thread Jiang Liu
On 2015/11/14 1:03, Lorenzo Pieralisi wrote:
> Please trim your emails, thanks.
> 
> On Fri, Nov 13, 2015 at 01:57:30PM +0100, Tomasz Nowicki wrote:
>> On 12.11.2015 16:05, Jiang Liu wrote:
> 
> [...]
> 
>>>>> IA64 actually ignores the translation type flag and just assume it's
>>>>> TypeTranslation, so there may be some IA64 BIOS implementations
>>>>> accidentally using TypeStatic. That's why we parsing SparseTranslation
>>>>> flag without checking TranslationType flag. I feel ARM64 may face the
>>>>> same situation as IA64:(
>>>>>
>>>>> We may expect (TypeStatic, 0-offset) and (TypeTranslation,
>>>>> non-0-offset) in real word. For other two combinations, I haven't
>>>>> found a real usage yet, though theoretically they are possible.
> 
> I do not understand why (TypeStatic, non-0-offset) is not a valid
> option. Aren't there any (x86) platforms with a CPU<->PCI _physical_
> address space offset out there (I am talking about memory space) ?

It's possible, but we have found such a design yet. If we eventually
encounter such a case, we need to enhance x86 specific code to support
it.

> 
>>>> I think we should not bend the generic code for IA64 only and expose
>>>> other platforms to the same issue. Instead, lets interpret spec
>>>> correctly and create IA64 quirk for the sake of backward compatibility.
>>>> Thoughts?
>>> I think there are at least two factors related to this issue.
>>>
>>> First we still lack of a way/framework to fix errors in ACPI resource
>>> descriptors. Recently we have refined ACPI resource parsing interfaces
>>> and enforced strictly sanity check. This brings us some regressions
>>> which are really BIOS flaws, but it used to work and now breaks:(
>>> I'm still struggling to get those regressions fixed. So we may run
>>> into the same situation if we enforce strict check for TranslationType:(
>>>
>>> Second enforcing strict check doesn't bring us too much benifits.
>>> Translation type is almost platform specific, and we haven't found a
>>> platform support both TypeTranslation and TypeStatic, so arch code
>>> may assume the correct translation type no matter what BIOS reports.
>>> So it won't hurt us even BIOS reports wrong translation type.
> 
> TBH I still do not understand what TranslationType actually means,
> I will ask whoever added that to the specification to understand it.
> 
>> That is my point, lets pass down all we need from resource range
>> descriptors to arch code, then archs with known quirks can whatever
>> is needed to make it works. However, generic code like
>> acpi_decode_space cannot play with offsets with silent IA64
>> assumption.
>>
>> To sum it up, your last patch looks ok to me modulo Lorenzo's concern:
>>>>>>>> If we go with this approach though, you are not adding the offset to
>>>>>>>> the resource when parsing the memory spaces in acpi_decode_space(),
>>>>>>>> are we
>>>>>>>> sure that's what we really want ?
>>>>>>>>
>>>>>>>> In DT, a host bridge range has a:
>>>>>>>>
>>>>>>>> - CPU physical address
>>>>>>>> - PCI bus address
>>>>>>>>
>>>>>>>> We use that to compute the offset between primary bus (ie CPU
>> physical
>>>>>>>> address) and secondary bus (ie PCI bus address).
>>>>>>>>
>>>>>>>> The value ending up in the PCI resource struct (for memory space) is
>>>>>>>> the CPU physical address, if you do not add the offset in
>>>>>>>> acpi_decode_space
>>>>>>>> that does not hold true on platforms where CPU<->PCI offset != 0 on
>>>>>>>> ACPI,
>>>>>>>> am I wrong ?
>> His concern is that your patch will cause:
>> acpi_pci_root_validate_resources(>dev, list,
>>   IORESOURCE_MEM);
>> to fail now.
> 
> Not really. My concern is that there might be platforms out there with
> an offset between the CPU and PCI physical address spaces, and if we
> remove the offset value in acpi_decode_space we can break them,
> because in the kernel struct resource data we have to have CPU physical
> addresses, not PCI ones. If offset == 0, we are home and dry, I do not
> understand why that's a given, which is what we would assume if Jiang's
> patch is merged as-is unless I am mistaken.
We try to exclude offset from struct resource in generic ACPI code,
and it's the arch's responsibility to decide how to manipulate struct
resource object if offset is not zero.

Currently offset is always zero for x86, and IA64 has arch specific
code to handle non-zero offset. So we should be safe without breaking
existing code. For ARM64, it's a little different from IA64 so it's
hard to share code between IA64 and ARM64.

> 
> Thanks,
> Lorenzo
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-13 Thread Jiang Liu
On 2015/11/14 1:03, Lorenzo Pieralisi wrote:
> Please trim your emails, thanks.
> 
> On Fri, Nov 13, 2015 at 01:57:30PM +0100, Tomasz Nowicki wrote:
>> On 12.11.2015 16:05, Jiang Liu wrote:
> 
> [...]
> 
>>>>> IA64 actually ignores the translation type flag and just assume it's
>>>>> TypeTranslation, so there may be some IA64 BIOS implementations
>>>>> accidentally using TypeStatic. That's why we parsing SparseTranslation
>>>>> flag without checking TranslationType flag. I feel ARM64 may face the
>>>>> same situation as IA64:(
>>>>>
>>>>> We may expect (TypeStatic, 0-offset) and (TypeTranslation,
>>>>> non-0-offset) in real word. For other two combinations, I haven't
>>>>> found a real usage yet, though theoretically they are possible.
> 
> I do not understand why (TypeStatic, non-0-offset) is not a valid
> option. Aren't there any (x86) platforms with a CPU<->PCI _physical_
> address space offset out there (I am talking about memory space) ?

It's possible, but we have found such a design yet. If we eventually
encounter such a case, we need to enhance x86 specific code to support
it.

> 
>>>> I think we should not bend the generic code for IA64 only and expose
>>>> other platforms to the same issue. Instead, lets interpret spec
>>>> correctly and create IA64 quirk for the sake of backward compatibility.
>>>> Thoughts?
>>> I think there are at least two factors related to this issue.
>>>
>>> First we still lack of a way/framework to fix errors in ACPI resource
>>> descriptors. Recently we have refined ACPI resource parsing interfaces
>>> and enforced strictly sanity check. This brings us some regressions
>>> which are really BIOS flaws, but it used to work and now breaks:(
>>> I'm still struggling to get those regressions fixed. So we may run
>>> into the same situation if we enforce strict check for TranslationType:(
>>>
>>> Second enforcing strict check doesn't bring us too much benifits.
>>> Translation type is almost platform specific, and we haven't found a
>>> platform support both TypeTranslation and TypeStatic, so arch code
>>> may assume the correct translation type no matter what BIOS reports.
>>> So it won't hurt us even BIOS reports wrong translation type.
> 
> TBH I still do not understand what TranslationType actually means,
> I will ask whoever added that to the specification to understand it.
> 
>> That is my point, lets pass down all we need from resource range
>> descriptors to arch code, then archs with known quirks can whatever
>> is needed to make it works. However, generic code like
>> acpi_decode_space cannot play with offsets with silent IA64
>> assumption.
>>
>> To sum it up, your last patch looks ok to me modulo Lorenzo's concern:
>>>>>>>> If we go with this approach though, you are not adding the offset to
>>>>>>>> the resource when parsing the memory spaces in acpi_decode_space(),
>>>>>>>> are we
>>>>>>>> sure that's what we really want ?
>>>>>>>>
>>>>>>>> In DT, a host bridge range has a:
>>>>>>>>
>>>>>>>> - CPU physical address
>>>>>>>> - PCI bus address
>>>>>>>>
>>>>>>>> We use that to compute the offset between primary bus (ie CPU
>> physical
>>>>>>>> address) and secondary bus (ie PCI bus address).
>>>>>>>>
>>>>>>>> The value ending up in the PCI resource struct (for memory space) is
>>>>>>>> the CPU physical address, if you do not add the offset in
>>>>>>>> acpi_decode_space
>>>>>>>> that does not hold true on platforms where CPU<->PCI offset != 0 on
>>>>>>>> ACPI,
>>>>>>>> am I wrong ?
>> His concern is that your patch will cause:
>> acpi_pci_root_validate_resources(>dev, list,
>>   IORESOURCE_MEM);
>> to fail now.
> 
> Not really. My concern is that there might be platforms out there with
> an offset between the CPU and PCI physical address spaces, and if we
> remove the offset value in acpi_decode_space we can break them,
> because in the kernel struct resource data we have to have CPU physical
> addresses, not PCI ones. If offset == 0, we are home and dry, I do not
> understand why that's a given, which is what we would assume if Jiang's
> patch is merged as-is unless I am mistaken.
We try to exclude offset from struct resource in generic ACPI code,
and it's the arch's responsibility to decide how to manipulate struct
resource object if offset is not zero.

Currently offset is always zero for x86, and IA64 has arch specific
code to handle non-zero offset. So we should be safe without breaking
existing code. For ARM64, it's a little different from IA64 so it's
hard to share code between IA64 and ARM64.

> 
> Thanks,
> Lorenzo
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-12 Thread Jiang Liu
On 2015/11/12 22:45, Tomasz Nowicki wrote:
> On 12.11.2015 15:04, Jiang Liu wrote:
>> On 2015/11/12 21:21, Tomasz Nowicki wrote:
>>> On 12.11.2015 09:43, Jiang Liu wrote:
>>>> On 2015/11/12 1:46, Lorenzo Pieralisi wrote:
>>>>> On Tue, Nov 10, 2015 at 01:50:46PM +0800, Jiang Liu wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>> In particular, I would like to understand, for an eg DWordIO
>>>>>>>> descriptor,
>>>>>>>> what Range Minimum, Range Maximum and Translation Offset represent,
>>>>>>>> they can't mean different things depending on the SW parsing them,
>>>>>>>> this totally defeats the purpose.
>>>>>>>
>>>>>>> I have no clue about what those mean in ACPI though.
>>>>>>>
>>>>>>> Generally speaking, each PCI domain is expected to have a (normally
>>>>>>> 64KB)
>>>>>>> range of CPU addresses that gets translated into PCI I/O space the
>>>>>>> same
>>>>>>> way that config space and memory space are handled.
>>>>>>> This is true for almost every architecture except for x86, which
>>>>>>> uses
>>>>>>> different CPU instructions for I/O space compared to the other
>>>>>>> spaces.
>>>>>>>
>>>>>>>> By the way, ia64 ioremaps the translation_offset (ie
>>>>>>>> new_space()), so
>>>>>>>> basically that's the CPU physical address at which the PCI host
>>>>>>>> bridge
>>>>>>>> map the IO space transactions), I do not think ia64 is any
>>>>>>>> different from
>>>>>>>> arm64 in this respect, if it is please provide an HW description
>>>>>>>> here from
>>>>>>>> the PCI bus perspective here (also an example of ia64 ACPI PCI
>>>>>>>> host bridge
>>>>>>>> tables would help).
>>>>>>>
>>>>>>> The main difference between ia64 and a lot of the other
>>>>>>> architectures (e.g.
>>>>>>> sparc is different again) is that ia64 defines a logical address
>>>>>>> range
>>>>>>> in terms of having a small number for each I/O space followed by the
>>>>>>> offset within that space as a 'port number' and uses a mapping
>>>>>>> function
>>>>>>> that is defined as
>>>>>>>
>>>>>>> static inline void *__ia64_mk_io_addr (unsigned long port)
>>>>>>> {
>>>>>>>   struct io_space *space = _space[IO_SPACE_NR(port)];
>>>>>>>   return (space->mmio_base | IO_SPACE_PORT(port););
>>>>>>> }
>>>>>>> static inline unsigned int inl(unsigned long port)
>>>>>>> {
>>>>>>>   return *__ia64_mk_io_addr(port);
>>>>>>> }
>>>>>>>
>>>>>>> Most architectures allow only one I/O port range and put it at a
>>>>>>> fixed
>>>>>>> virtual address so that inl() simply becomes
>>>>>>>
>>>>>>> static inline u32 inl(unsigned long addr)
>>>>>>> {
>>>>>>>   return readl(PCI_IOBASE + addr);
>>>>>>> }
>>>>>>>
>>>>>>> which noticeably reduces code size.
>>>>>>>
>>>>>>> On some architectures (powerpc, arm, arm64), we then get the same
>>>>>>> simplified
>>>>>>> definition with a fixed virtual address, and use pci_ioremap_io() or
>>>>>>> something like that to to map a physical address range into this
>>>>>>> virtual
>>>>>>> address window at the correct io_offset;
>>>>>> Hi all,
>>>>>>  Thanks for explanation, I found a way to make the ACPI resource
>>>>>> parsing interface arch neutral, it should help to address Lorenzo's
>>>>>> concern. Please refer to the attached patch. (It's still RFC, not
>>>>>> tested
>>>>>> yet).
>>>>>
>>>>> If we go with this approach though, you are not adding the offset to
>>>>>

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-12 Thread Jiang Liu
On 2015/11/12 21:21, Tomasz Nowicki wrote:
> On 12.11.2015 09:43, Jiang Liu wrote:
>> On 2015/11/12 1:46, Lorenzo Pieralisi wrote:
>>> On Tue, Nov 10, 2015 at 01:50:46PM +0800, Jiang Liu wrote:
>>>
>>> [...]
>>>
>>>>>> In particular, I would like to understand, for an eg DWordIO
>>>>>> descriptor,
>>>>>> what Range Minimum, Range Maximum and Translation Offset represent,
>>>>>> they can't mean different things depending on the SW parsing them,
>>>>>> this totally defeats the purpose.
>>>>>
>>>>> I have no clue about what those mean in ACPI though.
>>>>>
>>>>> Generally speaking, each PCI domain is expected to have a (normally
>>>>> 64KB)
>>>>> range of CPU addresses that gets translated into PCI I/O space the
>>>>> same
>>>>> way that config space and memory space are handled.
>>>>> This is true for almost every architecture except for x86, which uses
>>>>> different CPU instructions for I/O space compared to the other spaces.
>>>>>
>>>>>> By the way, ia64 ioremaps the translation_offset (ie new_space()), so
>>>>>> basically that's the CPU physical address at which the PCI host
>>>>>> bridge
>>>>>> map the IO space transactions), I do not think ia64 is any
>>>>>> different from
>>>>>> arm64 in this respect, if it is please provide an HW description
>>>>>> here from
>>>>>> the PCI bus perspective here (also an example of ia64 ACPI PCI
>>>>>> host bridge
>>>>>> tables would help).
>>>>>
>>>>> The main difference between ia64 and a lot of the other
>>>>> architectures (e.g.
>>>>> sparc is different again) is that ia64 defines a logical address range
>>>>> in terms of having a small number for each I/O space followed by the
>>>>> offset within that space as a 'port number' and uses a mapping
>>>>> function
>>>>> that is defined as
>>>>>
>>>>> static inline void *__ia64_mk_io_addr (unsigned long port)
>>>>> {
>>>>>  struct io_space *space = _space[IO_SPACE_NR(port)];
>>>>>  return (space->mmio_base | IO_SPACE_PORT(port););
>>>>> }
>>>>> static inline unsigned int inl(unsigned long port)
>>>>> {
>>>>>  return *__ia64_mk_io_addr(port);
>>>>> }
>>>>>
>>>>> Most architectures allow only one I/O port range and put it at a fixed
>>>>> virtual address so that inl() simply becomes
>>>>>
>>>>> static inline u32 inl(unsigned long addr)
>>>>> {
>>>>>  return readl(PCI_IOBASE + addr);
>>>>> }
>>>>>
>>>>> which noticeably reduces code size.
>>>>>
>>>>> On some architectures (powerpc, arm, arm64), we then get the same
>>>>> simplified
>>>>> definition with a fixed virtual address, and use pci_ioremap_io() or
>>>>> something like that to to map a physical address range into this
>>>>> virtual
>>>>> address window at the correct io_offset;
>>>> Hi all,
>>>> Thanks for explanation, I found a way to make the ACPI resource
>>>> parsing interface arch neutral, it should help to address Lorenzo's
>>>> concern. Please refer to the attached patch. (It's still RFC, not
>>>> tested
>>>> yet).
>>>
>>> If we go with this approach though, you are not adding the offset to
>>> the resource when parsing the memory spaces in acpi_decode_space(),
>>> are we
>>> sure that's what we really want ?
>>>
>>> In DT, a host bridge range has a:
>>>
>>> - CPU physical address
>>> - PCI bus address
>>>
>>> We use that to compute the offset between primary bus (ie CPU physical
>>> address) and secondary bus (ie PCI bus address).
>>>
>>> The value ending up in the PCI resource struct (for memory space) is
>>> the CPU physical address, if you do not add the offset in
>>> acpi_decode_space
>>> that does not hold true on platforms where CPU<->PCI offset != 0 on
>>> ACPI,
>>> am I wrong ?
>> Hi Lorenzo,
>> I may have found the divergence between us about the 

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-12 Thread Jiang Liu
On 2015/11/12 1:46, Lorenzo Pieralisi wrote:
> On Tue, Nov 10, 2015 at 01:50:46PM +0800, Jiang Liu wrote:
> 
> [...]
> 
>>>> In particular, I would like to understand, for an eg DWordIO descriptor,
>>>> what Range Minimum, Range Maximum and Translation Offset represent,
>>>> they can't mean different things depending on the SW parsing them,
>>>> this totally defeats the purpose.
>>>
>>> I have no clue about what those mean in ACPI though.
>>>
>>> Generally speaking, each PCI domain is expected to have a (normally 64KB)
>>> range of CPU addresses that gets translated into PCI I/O space the same
>>> way that config space and memory space are handled.
>>> This is true for almost every architecture except for x86, which uses
>>> different CPU instructions for I/O space compared to the other spaces.
>>>
>>>> By the way, ia64 ioremaps the translation_offset (ie new_space()), so
>>>> basically that's the CPU physical address at which the PCI host bridge
>>>> map the IO space transactions), I do not think ia64 is any different from
>>>> arm64 in this respect, if it is please provide an HW description here from
>>>> the PCI bus perspective here (also an example of ia64 ACPI PCI host bridge
>>>> tables would help).
>>>
>>> The main difference between ia64 and a lot of the other architectures (e.g.
>>> sparc is different again) is that ia64 defines a logical address range
>>> in terms of having a small number for each I/O space followed by the
>>> offset within that space as a 'port number' and uses a mapping function
>>> that is defined as
>>>
>>> static inline void *__ia64_mk_io_addr (unsigned long port)
>>> {
>>> struct io_space *space = _space[IO_SPACE_NR(port)];
>>> return (space->mmio_base | IO_SPACE_PORT(port););
>>> }
>>> static inline unsigned int inl(unsigned long port)
>>> {
>>> return *__ia64_mk_io_addr(port);
>>> }
>>>
>>> Most architectures allow only one I/O port range and put it at a fixed
>>> virtual address so that inl() simply becomes 
>>>
>>> static inline u32 inl(unsigned long addr)
>>> {
>>> return readl(PCI_IOBASE + addr);
>>> }
>>>
>>> which noticeably reduces code size.
>>>
>>> On some architectures (powerpc, arm, arm64), we then get the same simplified
>>> definition with a fixed virtual address, and use pci_ioremap_io() or
>>> something like that to to map a physical address range into this virtual
>>> address window at the correct io_offset;
>> Hi all,
>>  Thanks for explanation, I found a way to make the ACPI resource
>> parsing interface arch neutral, it should help to address Lorenzo's
>> concern. Please refer to the attached patch. (It's still RFC, not tested
>> yet).
> 
> If we go with this approach though, you are not adding the offset to
> the resource when parsing the memory spaces in acpi_decode_space(), are we
> sure that's what we really want ?
> 
> In DT, a host bridge range has a:
> 
> - CPU physical address
> - PCI bus address
> 
> We use that to compute the offset between primary bus (ie CPU physical
> address) and secondary bus (ie PCI bus address).
> 
> The value ending up in the PCI resource struct (for memory space) is
> the CPU physical address, if you do not add the offset in acpi_decode_space
> that does not hold true on platforms where CPU<->PCI offset != 0 on ACPI,
> am I wrong ?
Hi Lorenzo,
I may have found the divergence between us about the design here. You
treat it as a one-stage translation but I treat it as a
two-stage translation as below:
stage 1: map(translate) per-PCI-domain IO port address[0, 16M) into
system global IO port address. Here system global IO port address is
ioport_resource[0, IO_SPACE_LIMIT).
stage 2: map system IO port address into system memory address.

We need two objects of struct resource_win to support above two-stage
translation. One object, type of IORESOURCE_IO, is used to support
stage one, and it will also used to allocate IO port resources
for PCI devices. Another object, type of IORESOURCE_MMIO, is used
to allocate resource from iomem_resource and setup MMIO mapping
to actually access IO ports.

For ARM64, it doesn't support multiple per-PCI-domain(bus local)
IO port address space yet, so stage one seems to be optional
becomes the offset between bus local IO port address and system
IO port address is always 0. But we still need two objects of
struct resource_win. The first object is
{
offs

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-12 Thread Jiang Liu
On 2015/11/12 1:46, Lorenzo Pieralisi wrote:
> On Tue, Nov 10, 2015 at 01:50:46PM +0800, Jiang Liu wrote:
> 
> [...]
> 
>>>> In particular, I would like to understand, for an eg DWordIO descriptor,
>>>> what Range Minimum, Range Maximum and Translation Offset represent,
>>>> they can't mean different things depending on the SW parsing them,
>>>> this totally defeats the purpose.
>>>
>>> I have no clue about what those mean in ACPI though.
>>>
>>> Generally speaking, each PCI domain is expected to have a (normally 64KB)
>>> range of CPU addresses that gets translated into PCI I/O space the same
>>> way that config space and memory space are handled.
>>> This is true for almost every architecture except for x86, which uses
>>> different CPU instructions for I/O space compared to the other spaces.
>>>
>>>> By the way, ia64 ioremaps the translation_offset (ie new_space()), so
>>>> basically that's the CPU physical address at which the PCI host bridge
>>>> map the IO space transactions), I do not think ia64 is any different from
>>>> arm64 in this respect, if it is please provide an HW description here from
>>>> the PCI bus perspective here (also an example of ia64 ACPI PCI host bridge
>>>> tables would help).
>>>
>>> The main difference between ia64 and a lot of the other architectures (e.g.
>>> sparc is different again) is that ia64 defines a logical address range
>>> in terms of having a small number for each I/O space followed by the
>>> offset within that space as a 'port number' and uses a mapping function
>>> that is defined as
>>>
>>> static inline void *__ia64_mk_io_addr (unsigned long port)
>>> {
>>> struct io_space *space = _space[IO_SPACE_NR(port)];
>>> return (space->mmio_base | IO_SPACE_PORT(port););
>>> }
>>> static inline unsigned int inl(unsigned long port)
>>> {
>>> return *__ia64_mk_io_addr(port);
>>> }
>>>
>>> Most architectures allow only one I/O port range and put it at a fixed
>>> virtual address so that inl() simply becomes 
>>>
>>> static inline u32 inl(unsigned long addr)
>>> {
>>> return readl(PCI_IOBASE + addr);
>>> }
>>>
>>> which noticeably reduces code size.
>>>
>>> On some architectures (powerpc, arm, arm64), we then get the same simplified
>>> definition with a fixed virtual address, and use pci_ioremap_io() or
>>> something like that to to map a physical address range into this virtual
>>> address window at the correct io_offset;
>> Hi all,
>>  Thanks for explanation, I found a way to make the ACPI resource
>> parsing interface arch neutral, it should help to address Lorenzo's
>> concern. Please refer to the attached patch. (It's still RFC, not tested
>> yet).
> 
> If we go with this approach though, you are not adding the offset to
> the resource when parsing the memory spaces in acpi_decode_space(), are we
> sure that's what we really want ?
> 
> In DT, a host bridge range has a:
> 
> - CPU physical address
> - PCI bus address
> 
> We use that to compute the offset between primary bus (ie CPU physical
> address) and secondary bus (ie PCI bus address).
> 
> The value ending up in the PCI resource struct (for memory space) is
> the CPU physical address, if you do not add the offset in acpi_decode_space
> that does not hold true on platforms where CPU<->PCI offset != 0 on ACPI,
> am I wrong ?
Hi Lorenzo,
I may have found the divergence between us about the design here. You
treat it as a one-stage translation but I treat it as a
two-stage translation as below:
stage 1: map(translate) per-PCI-domain IO port address[0, 16M) into
system global IO port address. Here system global IO port address is
ioport_resource[0, IO_SPACE_LIMIT).
stage 2: map system IO port address into system memory address.

We need two objects of struct resource_win to support above two-stage
translation. One object, type of IORESOURCE_IO, is used to support
stage one, and it will also used to allocate IO port resources
for PCI devices. Another object, type of IORESOURCE_MMIO, is used
to allocate resource from iomem_resource and setup MMIO mapping
to actually access IO ports.

For ARM64, it doesn't support multiple per-PCI-domain(bus local)
IO port address space yet, so stage one seems to be optional
becomes the offset between bus local IO port address and system
IO port address is always 0. But we still need two objects of
struct resource_win. The first object is
{
offs

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-12 Thread Jiang Liu
On 2015/11/12 21:21, Tomasz Nowicki wrote:
> On 12.11.2015 09:43, Jiang Liu wrote:
>> On 2015/11/12 1:46, Lorenzo Pieralisi wrote:
>>> On Tue, Nov 10, 2015 at 01:50:46PM +0800, Jiang Liu wrote:
>>>
>>> [...]
>>>
>>>>>> In particular, I would like to understand, for an eg DWordIO
>>>>>> descriptor,
>>>>>> what Range Minimum, Range Maximum and Translation Offset represent,
>>>>>> they can't mean different things depending on the SW parsing them,
>>>>>> this totally defeats the purpose.
>>>>>
>>>>> I have no clue about what those mean in ACPI though.
>>>>>
>>>>> Generally speaking, each PCI domain is expected to have a (normally
>>>>> 64KB)
>>>>> range of CPU addresses that gets translated into PCI I/O space the
>>>>> same
>>>>> way that config space and memory space are handled.
>>>>> This is true for almost every architecture except for x86, which uses
>>>>> different CPU instructions for I/O space compared to the other spaces.
>>>>>
>>>>>> By the way, ia64 ioremaps the translation_offset (ie new_space()), so
>>>>>> basically that's the CPU physical address at which the PCI host
>>>>>> bridge
>>>>>> map the IO space transactions), I do not think ia64 is any
>>>>>> different from
>>>>>> arm64 in this respect, if it is please provide an HW description
>>>>>> here from
>>>>>> the PCI bus perspective here (also an example of ia64 ACPI PCI
>>>>>> host bridge
>>>>>> tables would help).
>>>>>
>>>>> The main difference between ia64 and a lot of the other
>>>>> architectures (e.g.
>>>>> sparc is different again) is that ia64 defines a logical address range
>>>>> in terms of having a small number for each I/O space followed by the
>>>>> offset within that space as a 'port number' and uses a mapping
>>>>> function
>>>>> that is defined as
>>>>>
>>>>> static inline void *__ia64_mk_io_addr (unsigned long port)
>>>>> {
>>>>>  struct io_space *space = _space[IO_SPACE_NR(port)];
>>>>>  return (space->mmio_base | IO_SPACE_PORT(port););
>>>>> }
>>>>> static inline unsigned int inl(unsigned long port)
>>>>> {
>>>>>  return *__ia64_mk_io_addr(port);
>>>>> }
>>>>>
>>>>> Most architectures allow only one I/O port range and put it at a fixed
>>>>> virtual address so that inl() simply becomes
>>>>>
>>>>> static inline u32 inl(unsigned long addr)
>>>>> {
>>>>>  return readl(PCI_IOBASE + addr);
>>>>> }
>>>>>
>>>>> which noticeably reduces code size.
>>>>>
>>>>> On some architectures (powerpc, arm, arm64), we then get the same
>>>>> simplified
>>>>> definition with a fixed virtual address, and use pci_ioremap_io() or
>>>>> something like that to to map a physical address range into this
>>>>> virtual
>>>>> address window at the correct io_offset;
>>>> Hi all,
>>>> Thanks for explanation, I found a way to make the ACPI resource
>>>> parsing interface arch neutral, it should help to address Lorenzo's
>>>> concern. Please refer to the attached patch. (It's still RFC, not
>>>> tested
>>>> yet).
>>>
>>> If we go with this approach though, you are not adding the offset to
>>> the resource when parsing the memory spaces in acpi_decode_space(),
>>> are we
>>> sure that's what we really want ?
>>>
>>> In DT, a host bridge range has a:
>>>
>>> - CPU physical address
>>> - PCI bus address
>>>
>>> We use that to compute the offset between primary bus (ie CPU physical
>>> address) and secondary bus (ie PCI bus address).
>>>
>>> The value ending up in the PCI resource struct (for memory space) is
>>> the CPU physical address, if you do not add the offset in
>>> acpi_decode_space
>>> that does not hold true on platforms where CPU<->PCI offset != 0 on
>>> ACPI,
>>> am I wrong ?
>> Hi Lorenzo,
>> I may have found the divergence between us about the 

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-12 Thread Jiang Liu
On 2015/11/12 22:45, Tomasz Nowicki wrote:
> On 12.11.2015 15:04, Jiang Liu wrote:
>> On 2015/11/12 21:21, Tomasz Nowicki wrote:
>>> On 12.11.2015 09:43, Jiang Liu wrote:
>>>> On 2015/11/12 1:46, Lorenzo Pieralisi wrote:
>>>>> On Tue, Nov 10, 2015 at 01:50:46PM +0800, Jiang Liu wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>> In particular, I would like to understand, for an eg DWordIO
>>>>>>>> descriptor,
>>>>>>>> what Range Minimum, Range Maximum and Translation Offset represent,
>>>>>>>> they can't mean different things depending on the SW parsing them,
>>>>>>>> this totally defeats the purpose.
>>>>>>>
>>>>>>> I have no clue about what those mean in ACPI though.
>>>>>>>
>>>>>>> Generally speaking, each PCI domain is expected to have a (normally
>>>>>>> 64KB)
>>>>>>> range of CPU addresses that gets translated into PCI I/O space the
>>>>>>> same
>>>>>>> way that config space and memory space are handled.
>>>>>>> This is true for almost every architecture except for x86, which
>>>>>>> uses
>>>>>>> different CPU instructions for I/O space compared to the other
>>>>>>> spaces.
>>>>>>>
>>>>>>>> By the way, ia64 ioremaps the translation_offset (ie
>>>>>>>> new_space()), so
>>>>>>>> basically that's the CPU physical address at which the PCI host
>>>>>>>> bridge
>>>>>>>> map the IO space transactions), I do not think ia64 is any
>>>>>>>> different from
>>>>>>>> arm64 in this respect, if it is please provide an HW description
>>>>>>>> here from
>>>>>>>> the PCI bus perspective here (also an example of ia64 ACPI PCI
>>>>>>>> host bridge
>>>>>>>> tables would help).
>>>>>>>
>>>>>>> The main difference between ia64 and a lot of the other
>>>>>>> architectures (e.g.
>>>>>>> sparc is different again) is that ia64 defines a logical address
>>>>>>> range
>>>>>>> in terms of having a small number for each I/O space followed by the
>>>>>>> offset within that space as a 'port number' and uses a mapping
>>>>>>> function
>>>>>>> that is defined as
>>>>>>>
>>>>>>> static inline void *__ia64_mk_io_addr (unsigned long port)
>>>>>>> {
>>>>>>>   struct io_space *space = _space[IO_SPACE_NR(port)];
>>>>>>>   return (space->mmio_base | IO_SPACE_PORT(port););
>>>>>>> }
>>>>>>> static inline unsigned int inl(unsigned long port)
>>>>>>> {
>>>>>>>   return *__ia64_mk_io_addr(port);
>>>>>>> }
>>>>>>>
>>>>>>> Most architectures allow only one I/O port range and put it at a
>>>>>>> fixed
>>>>>>> virtual address so that inl() simply becomes
>>>>>>>
>>>>>>> static inline u32 inl(unsigned long addr)
>>>>>>> {
>>>>>>>   return readl(PCI_IOBASE + addr);
>>>>>>> }
>>>>>>>
>>>>>>> which noticeably reduces code size.
>>>>>>>
>>>>>>> On some architectures (powerpc, arm, arm64), we then get the same
>>>>>>> simplified
>>>>>>> definition with a fixed virtual address, and use pci_ioremap_io() or
>>>>>>> something like that to to map a physical address range into this
>>>>>>> virtual
>>>>>>> address window at the correct io_offset;
>>>>>> Hi all,
>>>>>>  Thanks for explanation, I found a way to make the ACPI resource
>>>>>> parsing interface arch neutral, it should help to address Lorenzo's
>>>>>> concern. Please refer to the attached patch. (It's still RFC, not
>>>>>> tested
>>>>>> yet).
>>>>>
>>>>> If we go with this approach though, you are not adding the offset to
>>>>>

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-09 Thread Jiang Liu
On 2015/11/10 4:09, Arnd Bergmann wrote:
> On Monday 09 November 2015 17:10:43 Lorenzo Pieralisi wrote:
>> On Mon, Nov 09, 2015 at 03:07:38PM +0100, Tomasz Nowicki wrote:
>>> On 06.11.2015 14:22, Jiang Liu wrote:
>>>> On 2015/11/6 20:40, Tomasz Nowicki wrote:
>>>>> On 06.11.2015 12:46, Jiang Liu wrote:
>>>>>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>>>>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>>>>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>>>>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>>>>>   0x, // Granularity
>>>>>>>   0x, // Range Minimum
>>>>>>>   0x, // Range Maximum
>>>>>>>   0x3EFF, // Translation Offset
>>>>>>>   0x0001, // Length
>>>>>>>   ,, , TypeStatic)
>>>>>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>>>>>> According to my understanding, ARM/ARM64 has no concept of IO port
>>>>>> address space, so the PCI host bridge will map IO port on PCI side
>>>>>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>>>>>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>>>>>> translation. If that's true, it should use 'TypeTranslation' instead
>>>>>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>>>>>> support 'TypeTranslation' yet, so we need to find a solution for it.
>>>>>
>>>>> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
>>>>> descriptors and an extra kernel patch to support it.
>>>> How about the attached to patch to support TypeTranslation?
>>>> It only passes compilation:)
>>>
>>> Based on the further discussion, your draft patch looks good to me.
>>> Lorenzo, do you agree?
>>
>> No, because I still do not understand the difference between ia64 and
>> arm64 (they both drive IO ports cycles through MMIO so the resource
>> descriptors content must be the same or better they must mean the same
>> thing). On top of that, this is something that was heavily debated for DT:
>>
>> http://www.spinics.net/lists/arm-kernel/msg345633.html
>>
>> and I would like to get Arnd and Bjorn opinion on this because we
>> should not "interpret" ACPI specifications, we should understand
>> what they are supposed to describe and write kernel code accordingly.
>>
>> In particular, I would like to understand, for an eg DWordIO descriptor,
>> what Range Minimum, Range Maximum and Translation Offset represent,
>> they can't mean different things depending on the SW parsing them,
>> this totally defeats the purpose.
> 
> I have no clue about what those mean in ACPI though.
> 
> Generally speaking, each PCI domain is expected to have a (normally 64KB)
> range of CPU addresses that gets translated into PCI I/O space the same
> way that config space and memory space are handled.
> This is true for almost every architecture except for x86, which uses
> different CPU instructions for I/O space compared to the other spaces.
> 
>> By the way, ia64 ioremaps the translation_offset (ie new_space()), so
>> basically that's the CPU physical address at which the PCI host bridge
>> map the IO space transactions), I do not think ia64 is any different from
>> arm64 in this respect, if it is please provide an HW description here from
>> the PCI bus perspective here (also an example of ia64 ACPI PCI host bridge
>> tables would help).
> 
> The main difference between ia64 and a lot of the other architectures (e.g.
> sparc is different again) is that ia64 defines a logical address range
> in terms of having a small number for each I/O space followed by the
> offset within that space as a 'port number' and uses a mapping function
> that is defined as
> 
> static inline void *__ia64_mk_io_addr (unsigned long port)
> {
> struct io_space *space = _space[IO_SPACE_NR(port)];
> return (space->mmio_base | IO_SPACE_PORT(port););
> }
> static inline unsigned int inl(unsigned long port)
> {
> return *__ia64_mk_io_addr(port);
> }
> 
> Most architectures allow only one I/O port range and put it at a fixed
> virtual address so that inl() simply becomes 
> 
> static inline u32 inl(unsigned long addr)
> {
> return readl(PCI_IOBASE + addr

Re: [PATCH] acpi: add support for extended IRQ to PCI link

2015-11-09 Thread Jiang Liu
On 2015/11/9 13:45, Sinan Kaya wrote:
> 
> 
> On 11/9/2015 12:24 AM, Jiang Liu wrote:
>>> +u32 possible[ACPI_PCI_LINK_MAX_POSSIBLE];
>>> >  u8 initialized:1;
>>> >  u8 reserved:7;
>>> >  };
>> Hi Sinan,
>> This data structure become some sort of big, any idea to reduce
>> memory consumption?
>> Thanks,
>> Gerry
>>
> Hi Gerry,
> 
> There are two constants in the code.
> 
> #define ACPI_PCI_LINK_MAX_POSSIBLE16
> 
> I changed the data type above. Previously it was consuming 16 bytes now
> 64 bytes.
Aha, I made a mistake. ACPI_PCI_LINK_MAX_POSSIBLE hasn't been changed,
so the space increasing is not so big:)

> 
> The second one is this.
> 
> #define ACPI_MAX_IRQS 256
> 
> I changed ACPI_MAX_IRQS to 1020 from 256. Let's assume 1024.
> 
> I'm concerned about this though since you warned. This used to consume
> 1024 bytes now 4096 bytes.
> 
> static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
> PIRQ_PENALTY_ISA_ALWAYS,/* IRQ0 timer */
> ...
> }
> 
> Sinan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-09 Thread Jiang Liu
On 2015/11/10 4:09, Arnd Bergmann wrote:
> On Monday 09 November 2015 17:10:43 Lorenzo Pieralisi wrote:
>> On Mon, Nov 09, 2015 at 03:07:38PM +0100, Tomasz Nowicki wrote:
>>> On 06.11.2015 14:22, Jiang Liu wrote:
>>>> On 2015/11/6 20:40, Tomasz Nowicki wrote:
>>>>> On 06.11.2015 12:46, Jiang Liu wrote:
>>>>>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>>>>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>>>>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>>>>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>>>>>   0x, // Granularity
>>>>>>>   0x, // Range Minimum
>>>>>>>   0x, // Range Maximum
>>>>>>>   0x3EFF, // Translation Offset
>>>>>>>   0x0001, // Length
>>>>>>>   ,, , TypeStatic)
>>>>>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>>>>>> According to my understanding, ARM/ARM64 has no concept of IO port
>>>>>> address space, so the PCI host bridge will map IO port on PCI side
>>>>>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>>>>>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>>>>>> translation. If that's true, it should use 'TypeTranslation' instead
>>>>>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>>>>>> support 'TypeTranslation' yet, so we need to find a solution for it.
>>>>>
>>>>> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
>>>>> descriptors and an extra kernel patch to support it.
>>>> How about the attached to patch to support TypeTranslation?
>>>> It only passes compilation:)
>>>
>>> Based on the further discussion, your draft patch looks good to me.
>>> Lorenzo, do you agree?
>>
>> No, because I still do not understand the difference between ia64 and
>> arm64 (they both drive IO ports cycles through MMIO so the resource
>> descriptors content must be the same or better they must mean the same
>> thing). On top of that, this is something that was heavily debated for DT:
>>
>> http://www.spinics.net/lists/arm-kernel/msg345633.html
>>
>> and I would like to get Arnd and Bjorn opinion on this because we
>> should not "interpret" ACPI specifications, we should understand
>> what they are supposed to describe and write kernel code accordingly.
>>
>> In particular, I would like to understand, for an eg DWordIO descriptor,
>> what Range Minimum, Range Maximum and Translation Offset represent,
>> they can't mean different things depending on the SW parsing them,
>> this totally defeats the purpose.
> 
> I have no clue about what those mean in ACPI though.
> 
> Generally speaking, each PCI domain is expected to have a (normally 64KB)
> range of CPU addresses that gets translated into PCI I/O space the same
> way that config space and memory space are handled.
> This is true for almost every architecture except for x86, which uses
> different CPU instructions for I/O space compared to the other spaces.
> 
>> By the way, ia64 ioremaps the translation_offset (ie new_space()), so
>> basically that's the CPU physical address at which the PCI host bridge
>> map the IO space transactions), I do not think ia64 is any different from
>> arm64 in this respect, if it is please provide an HW description here from
>> the PCI bus perspective here (also an example of ia64 ACPI PCI host bridge
>> tables would help).
> 
> The main difference between ia64 and a lot of the other architectures (e.g.
> sparc is different again) is that ia64 defines a logical address range
> in terms of having a small number for each I/O space followed by the
> offset within that space as a 'port number' and uses a mapping function
> that is defined as
> 
> static inline void *__ia64_mk_io_addr (unsigned long port)
> {
> struct io_space *space = _space[IO_SPACE_NR(port)];
> return (space->mmio_base | IO_SPACE_PORT(port););
> }
> static inline unsigned int inl(unsigned long port)
> {
> return *__ia64_mk_io_addr(port);
> }
> 
> Most architectures allow only one I/O port range and put it at a fixed
> virtual address so that inl() simply becomes 
> 
> static inline u32 inl(unsigned long addr)
> {
> return readl(PCI_IOBASE + addr)

Re: [PATCH] acpi: add support for extended IRQ to PCI link

2015-11-09 Thread Jiang Liu
On 2015/11/9 13:45, Sinan Kaya wrote:
> 
> 
> On 11/9/2015 12:24 AM, Jiang Liu wrote:
>>> +u32 possible[ACPI_PCI_LINK_MAX_POSSIBLE];
>>> >  u8 initialized:1;
>>> >  u8 reserved:7;
>>> >  };
>> Hi Sinan,
>> This data structure become some sort of big, any idea to reduce
>> memory consumption?
>> Thanks,
>> Gerry
>>
> Hi Gerry,
> 
> There are two constants in the code.
> 
> #define ACPI_PCI_LINK_MAX_POSSIBLE16
> 
> I changed the data type above. Previously it was consuming 16 bytes now
> 64 bytes.
Aha, I made a mistake. ACPI_PCI_LINK_MAX_POSSIBLE hasn't been changed,
so the space increasing is not so big:)

> 
> The second one is this.
> 
> #define ACPI_MAX_IRQS 256
> 
> I changed ACPI_MAX_IRQS to 1020 from 256. Let's assume 1024.
> 
> I'm concerned about this though since you warned. This used to consume
> 1024 bytes now 4096 bytes.
> 
> static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
> PIRQ_PENALTY_ISA_ALWAYS,/* IRQ0 timer */
> ...
> }
> 
> Sinan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lock-up on boot with x86/PCI/ACPI: Use common interface to support PCI host bridge

2015-11-08 Thread Jiang Liu
On 2015/11/7 15:56, Arthur Marsh wrote:
> Hi, I've run into a situation where I've been getting a lock-up a few
> seconds into the boot process on a machine with an ASUS A8V-MX
> motherboard, BIOS 050312/06/2005 with AMD Athlon(tm) 64 Processor
> 3200+ (single core) with kernel compiled in 32 bit mode (config attached
> was used for both the problem kernel and kernel with the patch reverted,
> dmesg attached was for the kernel with the patch reverted).
> 
> A git bisect traced the problem back to:
> 
> git bisect good
> 4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
> commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> Author: Jiang Liu 
> Date:   Wed Oct 14 14:29:41 2015 +0800
> 
>  x86/PCI/ACPI: Use common interface to support PCI host bridge
> 
>  Use common interface to simplify ACPI PCI host bridge implementation.
> 
>  Signed-off-by: Jiang Liu 
>  Reviewed-by: Hanjun Guo 
>  Acked-by: Bjorn Helgaas 
>  Signed-off-by: Rafael J. Wysocki 
> 
> :04 04 a3447eea376b5a3e6f57deb35cf064c5481b45e3
> f64d8e49fd87b776933dfa3dfefcb33509004d3f M  arch
> 
> From the boot-up I get the message as shown in the images at:
> http://www.users.on.net/~arthur.marsh/20151107601.jpg and
> http://www.users.on.net/~arthur.marsh/20151107602.jpg
> 
> The boot-up suggests trying rebooting with pci=alloc but that didn't help.
> 
> The errors shown include
> "BAR 0: trying firmware assignment [io  size 0x0020]"
> "BAR 0: [io  size 0x0020] conflicts with PCI Bus #00 [io  0x-0x]
> "BAR 0: failed to assign [io  size 0x0020]
> 
> After reverting the patch and installing the resulting kernel I was able
> to boot normally.
> 
> I'd be happy to provide any further information and run further tests to
> help identify and resolve the proble.
Hi Arthur,
Could you please help to try the attached test patch?
Thanks,
Gerry

> 
> Arthur.
> 
> 
>From 02818ba34bfa76d93f2a29c85660da0323b0b457 Mon Sep 17 00:00:00 2001
From: Liu Jiang 
Date: Mon, 9 Nov 2015 13:36:48 +0800
Subject: [PATCH]


Signed-off-by: Liu Jiang 
---
 arch/x86/pci/bus_numa.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 7bcf06a7cd12..022d83158cdb 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -51,6 +51,8 @@ void x86_pci_root_bus_resources(int bus, struct list_head *resources)
 		pci_add_resource(resources, >busn);
 
 	list_for_each_entry(root_res, >resources, list) {
+		pci_add_resource(resources, _res->res);
+#if 0
 		struct resource *res;
 		struct resource *root;
 
@@ -61,6 +63,7 @@ void x86_pci_root_bus_resources(int bus, struct list_head *resources)
 		else
 			root = _resource;
 		insert_resource(root, res);
+#endif
 	}
 	return;
 
-- 
1.7.10.4



Re: [PATCH] acpi: add support for extended IRQ to PCI link

2015-11-08 Thread Jiang Liu
On 2015/11/9 0:07, Sinan Kaya wrote:
> The ACPI compiler uses the extended format when
> used interrupt numbers are greater than 256.
> The PCI link code currently only supports simple
> interrupt format. The IRQ numbers are represented
> using 32 bits when extended IRQ syntax. This patch
> changes the interrupt number type to 32 bits and
> places an upper limit of 1020 as possible interrupt
> id. Additional checks have been placed to prevent
> out of bounds writes.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/acpi/pci_link.c | 35 ++-
>  1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> index 7c8408b..18a9190 100644
> --- a/drivers/acpi/pci_link.c
> +++ b/drivers/acpi/pci_link.c
> @@ -1,6 +1,7 @@
>  /*
>   *  pci_link.c - ACPI PCI Interrupt Link Device Driver ($Revision: 34 $)
>   *
> + *  Copyright (c) 2015, The Linux Foundation. All rights reserved.
>   *  Copyright (C) 2001, 2002 Andy Grover 
>   *  Copyright (C) 2001, 2002 Paul Diefenbaugh 
>   *  Copyright (C) 2002   Dominik Brodowski 
> @@ -67,12 +68,12 @@ static struct acpi_scan_handler pci_link_handler = {
>   * later even the link is disable. Instead, we just repick the active irq
>   */
>  struct acpi_pci_link_irq {
> - u8 active;  /* Current IRQ */
> + u32 active; /* Current IRQ */
>   u8 triggering;  /* All IRQs */
>   u8 polarity;/* All IRQs */
>   u8 resource_type;
>   u8 possible_count;
> - u8 possible[ACPI_PCI_LINK_MAX_POSSIBLE];
> + u32 possible[ACPI_PCI_LINK_MAX_POSSIBLE];
>   u8 initialized:1;
>   u8 reserved:7;
>  };
Hi Sinan,
This data structure become some sort of big, any idea to reduce
memory consumption?
Thanks,
Gerry

> @@ -437,7 +438,7 @@ static int acpi_pci_link_set(struct acpi_pci_link *link, 
> int irq)
>   * enabled system.
>   */
>  
> -#define ACPI_MAX_IRQS256
> +#define ACPI_MAX_IRQS1020
>  #define ACPI_MAX_ISA_IRQ 16
>  
>  #define PIRQ_PENALTY_PCI_AVAILABLE   (0)
> @@ -493,7 +494,8 @@ int __init acpi_irq_penalty_init(void)
>   penalty;
>   }
>  
> - } else if (link->irq.active) {
> + } else if (link->irq.active &&
> + (link->irq.active < ACPI_MAX_IRQS)) {
>   acpi_irq_penalty[link->irq.active] +=
>   PIRQ_PENALTY_PCI_POSSIBLE;
>   }
> @@ -542,14 +544,19 @@ static int acpi_pci_link_allocate(struct acpi_pci_link 
> *link)
>   irq = link->irq.possible[link->irq.possible_count - 1];
>  
>   if (acpi_irq_balance || !link->irq.active) {
> - /*
> -  * Select the best IRQ.  This is done in reverse to promote
> -  * the use of IRQs 9, 10, 11, and >15.
> -  */
> - for (i = (link->irq.possible_count - 1); i >= 0; i--) {
> - if (acpi_irq_penalty[irq] >
> - acpi_irq_penalty[link->irq.possible[i]])
> - irq = link->irq.possible[i];
> +
> + if (irq < ACPI_MAX_IRQS) {
> + /*
> +  * Select the best IRQ.  This is done in reverse to
> +  * promote the use of IRQs 9, 10, 11, and >15.
> +  */
> + for (i = (link->irq.possible_count - 1); i >= 0;
> + i--) {
> + if ((link->irq.possible[i] < ACPI_MAX_IRQS) &&
> + (acpi_irq_penalty[irq] >
> + acpi_irq_penalty[link->irq.possible[i]]))
> + irq = link->irq.possible[i];
> + }
>   }
>   }
>   if (acpi_irq_penalty[irq] >= PIRQ_PENALTY_ISA_ALWAYS) {
> @@ -568,7 +575,9 @@ static int acpi_pci_link_allocate(struct acpi_pci_link 
> *link)
>   acpi_device_bid(link->device));
>   return -ENODEV;
>   } else {
> - acpi_irq_penalty[link->irq.active] += PIRQ_PENALTY_PCI_USING;
> + if (link->irq.active < ACPI_MAX_IRQS)
> + acpi_irq_penalty[link->irq.active] +=
> + PIRQ_PENALTY_PCI_USING;
>   printk(KERN_WARNING PREFIX "%s [%s] enabled at IRQ %d\n",
>  acpi_device_name(link->device),
>  acpi_device_bid(link->device), link->irq.active);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lock-up on boot with x86/PCI/ACPI: Use common interface to support PCI host bridge

2015-11-08 Thread Jiang Liu
On 2015/11/7 15:56, Arthur Marsh wrote:
> Hi, I've run into a situation where I've been getting a lock-up a few
> seconds into the boot process on a machine with an ASUS A8V-MX
> motherboard, BIOS 050312/06/2005 with AMD Athlon(tm) 64 Processor
> 3200+ (single core) with kernel compiled in 32 bit mode (config attached
> was used for both the problem kernel and kernel with the patch reverted,
> dmesg attached was for the kernel with the patch reverted).
> 
> A git bisect traced the problem back to:
> 
> git bisect good
> 4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
> commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> Author: Jiang Liu 
> Date:   Wed Oct 14 14:29:41 2015 +0800
> 
>  x86/PCI/ACPI: Use common interface to support PCI host bridge
> 
>  Use common interface to simplify ACPI PCI host bridge implementation.
> 
>  Signed-off-by: Jiang Liu 
>  Reviewed-by: Hanjun Guo 
>  Acked-by: Bjorn Helgaas 
>  Signed-off-by: Rafael J. Wysocki 
> 
> :04 04 a3447eea376b5a3e6f57deb35cf064c5481b45e3
> f64d8e49fd87b776933dfa3dfefcb33509004d3f M  arch
> 
> From the boot-up I get the message as shown in the images at:
> http://www.users.on.net/~arthur.marsh/20151107601.jpg and
> http://www.users.on.net/~arthur.marsh/20151107602.jpg
> 
> The boot-up suggests trying rebooting with pci=alloc but that didn't help.
> 
> The errors shown include
> "BAR 0: trying firmware assignment [io  size 0x0020]"
> "BAR 0: [io  size 0x0020] conflicts with PCI Bus #00 [io  0x-0x]
> "BAR 0: failed to assign [io  size 0x0020]
> 
> After reverting the patch and installing the resulting kernel I was able
> to boot normally.
> 
> I'd be happy to provide any further information and run further tests to
> help identify and resolve the proble.
Hi Arthur,
Sorry for the regression. Could you please also help to
provide the ACPI tables from the affected system? You may get
ACPI tables by installing acpidump and then 'acpidump > acpitables.bin'.
Thanks,
Gerry


> 
> Arthur.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lock-up on boot with x86/PCI/ACPI: Use common interface to support PCI host bridge

2015-11-08 Thread Jiang Liu
On 2015/11/7 15:56, Arthur Marsh wrote:
> Hi, I've run into a situation where I've been getting a lock-up a few
> seconds into the boot process on a machine with an ASUS A8V-MX
> motherboard, BIOS 050312/06/2005 with AMD Athlon(tm) 64 Processor
> 3200+ (single core) with kernel compiled in 32 bit mode (config attached
> was used for both the problem kernel and kernel with the patch reverted,
> dmesg attached was for the kernel with the patch reverted).
> 
> A git bisect traced the problem back to:
> 
> git bisect good
> 4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
> commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> Author: Jiang Liu <jiang@linux.intel.com>
> Date:   Wed Oct 14 14:29:41 2015 +0800
> 
>  x86/PCI/ACPI: Use common interface to support PCI host bridge
> 
>  Use common interface to simplify ACPI PCI host bridge implementation.
> 
>  Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>  Reviewed-by: Hanjun Guo <hanjun@linaro.org>
>  Acked-by: Bjorn Helgaas <bhelg...@google.com>
>  Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> 
> :04 04 a3447eea376b5a3e6f57deb35cf064c5481b45e3
> f64d8e49fd87b776933dfa3dfefcb33509004d3f M  arch
> 
> From the boot-up I get the message as shown in the images at:
> http://www.users.on.net/~arthur.marsh/20151107601.jpg and
> http://www.users.on.net/~arthur.marsh/20151107602.jpg
> 
> The boot-up suggests trying rebooting with pci=alloc but that didn't help.
> 
> The errors shown include
> "BAR 0: trying firmware assignment [io  size 0x0020]"
> "BAR 0: [io  size 0x0020] conflicts with PCI Bus #00 [io  0x-0x]
> "BAR 0: failed to assign [io  size 0x0020]
> 
> After reverting the patch and installing the resulting kernel I was able
> to boot normally.
> 
> I'd be happy to provide any further information and run further tests to
> help identify and resolve the proble.
Hi Arthur,
Sorry for the regression. Could you please also help to
provide the ACPI tables from the affected system? You may get
ACPI tables by installing acpidump and then 'acpidump > acpitables.bin'.
Thanks,
Gerry


> 
> Arthur.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] acpi: add support for extended IRQ to PCI link

2015-11-08 Thread Jiang Liu
On 2015/11/9 0:07, Sinan Kaya wrote:
> The ACPI compiler uses the extended format when
> used interrupt numbers are greater than 256.
> The PCI link code currently only supports simple
> interrupt format. The IRQ numbers are represented
> using 32 bits when extended IRQ syntax. This patch
> changes the interrupt number type to 32 bits and
> places an upper limit of 1020 as possible interrupt
> id. Additional checks have been placed to prevent
> out of bounds writes.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/acpi/pci_link.c | 35 ++-
>  1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> index 7c8408b..18a9190 100644
> --- a/drivers/acpi/pci_link.c
> +++ b/drivers/acpi/pci_link.c
> @@ -1,6 +1,7 @@
>  /*
>   *  pci_link.c - ACPI PCI Interrupt Link Device Driver ($Revision: 34 $)
>   *
> + *  Copyright (c) 2015, The Linux Foundation. All rights reserved.
>   *  Copyright (C) 2001, 2002 Andy Grover 
>   *  Copyright (C) 2001, 2002 Paul Diefenbaugh 
>   *  Copyright (C) 2002   Dominik Brodowski 
> @@ -67,12 +68,12 @@ static struct acpi_scan_handler pci_link_handler = {
>   * later even the link is disable. Instead, we just repick the active irq
>   */
>  struct acpi_pci_link_irq {
> - u8 active;  /* Current IRQ */
> + u32 active; /* Current IRQ */
>   u8 triggering;  /* All IRQs */
>   u8 polarity;/* All IRQs */
>   u8 resource_type;
>   u8 possible_count;
> - u8 possible[ACPI_PCI_LINK_MAX_POSSIBLE];
> + u32 possible[ACPI_PCI_LINK_MAX_POSSIBLE];
>   u8 initialized:1;
>   u8 reserved:7;
>  };
Hi Sinan,
This data structure become some sort of big, any idea to reduce
memory consumption?
Thanks,
Gerry

> @@ -437,7 +438,7 @@ static int acpi_pci_link_set(struct acpi_pci_link *link, 
> int irq)
>   * enabled system.
>   */
>  
> -#define ACPI_MAX_IRQS256
> +#define ACPI_MAX_IRQS1020
>  #define ACPI_MAX_ISA_IRQ 16
>  
>  #define PIRQ_PENALTY_PCI_AVAILABLE   (0)
> @@ -493,7 +494,8 @@ int __init acpi_irq_penalty_init(void)
>   penalty;
>   }
>  
> - } else if (link->irq.active) {
> + } else if (link->irq.active &&
> + (link->irq.active < ACPI_MAX_IRQS)) {
>   acpi_irq_penalty[link->irq.active] +=
>   PIRQ_PENALTY_PCI_POSSIBLE;
>   }
> @@ -542,14 +544,19 @@ static int acpi_pci_link_allocate(struct acpi_pci_link 
> *link)
>   irq = link->irq.possible[link->irq.possible_count - 1];
>  
>   if (acpi_irq_balance || !link->irq.active) {
> - /*
> -  * Select the best IRQ.  This is done in reverse to promote
> -  * the use of IRQs 9, 10, 11, and >15.
> -  */
> - for (i = (link->irq.possible_count - 1); i >= 0; i--) {
> - if (acpi_irq_penalty[irq] >
> - acpi_irq_penalty[link->irq.possible[i]])
> - irq = link->irq.possible[i];
> +
> + if (irq < ACPI_MAX_IRQS) {
> + /*
> +  * Select the best IRQ.  This is done in reverse to
> +  * promote the use of IRQs 9, 10, 11, and >15.
> +  */
> + for (i = (link->irq.possible_count - 1); i >= 0;
> + i--) {
> + if ((link->irq.possible[i] < ACPI_MAX_IRQS) &&
> + (acpi_irq_penalty[irq] >
> + acpi_irq_penalty[link->irq.possible[i]]))
> + irq = link->irq.possible[i];
> + }
>   }
>   }
>   if (acpi_irq_penalty[irq] >= PIRQ_PENALTY_ISA_ALWAYS) {
> @@ -568,7 +575,9 @@ static int acpi_pci_link_allocate(struct acpi_pci_link 
> *link)
>   acpi_device_bid(link->device));
>   return -ENODEV;
>   } else {
> - acpi_irq_penalty[link->irq.active] += PIRQ_PENALTY_PCI_USING;
> + if (link->irq.active < ACPI_MAX_IRQS)
> + acpi_irq_penalty[link->irq.active] +=
> + PIRQ_PENALTY_PCI_USING;
>   printk(KERN_WARNING PREFIX "%s [%s] enabled at IRQ %d\n",
>  acpi_device_name(link->device),
>  acpi_device_bid(link->device), link->irq.active);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lock-up on boot with x86/PCI/ACPI: Use common interface to support PCI host bridge

2015-11-08 Thread Jiang Liu
On 2015/11/7 15:56, Arthur Marsh wrote:
> Hi, I've run into a situation where I've been getting a lock-up a few
> seconds into the boot process on a machine with an ASUS A8V-MX
> motherboard, BIOS 050312/06/2005 with AMD Athlon(tm) 64 Processor
> 3200+ (single core) with kernel compiled in 32 bit mode (config attached
> was used for both the problem kernel and kernel with the patch reverted,
> dmesg attached was for the kernel with the patch reverted).
> 
> A git bisect traced the problem back to:
> 
> git bisect good
> 4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
> commit 4d6b4e69a245e9df4b84dba387596086cb66887d
> Author: Jiang Liu <jiang@linux.intel.com>
> Date:   Wed Oct 14 14:29:41 2015 +0800
> 
>  x86/PCI/ACPI: Use common interface to support PCI host bridge
> 
>  Use common interface to simplify ACPI PCI host bridge implementation.
> 
>  Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>  Reviewed-by: Hanjun Guo <hanjun@linaro.org>
>  Acked-by: Bjorn Helgaas <bhelg...@google.com>
>  Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> 
> :04 04 a3447eea376b5a3e6f57deb35cf064c5481b45e3
> f64d8e49fd87b776933dfa3dfefcb33509004d3f M  arch
> 
> From the boot-up I get the message as shown in the images at:
> http://www.users.on.net/~arthur.marsh/20151107601.jpg and
> http://www.users.on.net/~arthur.marsh/20151107602.jpg
> 
> The boot-up suggests trying rebooting with pci=alloc but that didn't help.
> 
> The errors shown include
> "BAR 0: trying firmware assignment [io  size 0x0020]"
> "BAR 0: [io  size 0x0020] conflicts with PCI Bus #00 [io  0x-0x]
> "BAR 0: failed to assign [io  size 0x0020]
> 
> After reverting the patch and installing the resulting kernel I was able
> to boot normally.
> 
> I'd be happy to provide any further information and run further tests to
> help identify and resolve the proble.
Hi Arthur,
Could you please help to try the attached test patch?
Thanks,
Gerry

> 
> Arthur.
> 
> 
>From 02818ba34bfa76d93f2a29c85660da0323b0b457 Mon Sep 17 00:00:00 2001
From: Liu Jiang <jiang@linux.intel.com>
Date: Mon, 9 Nov 2015 13:36:48 +0800
Subject: [PATCH]


Signed-off-by: Liu Jiang <jiang@linux.intel.com>
---
 arch/x86/pci/bus_numa.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/pci/bus_numa.c b/arch/x86/pci/bus_numa.c
index 7bcf06a7cd12..022d83158cdb 100644
--- a/arch/x86/pci/bus_numa.c
+++ b/arch/x86/pci/bus_numa.c
@@ -51,6 +51,8 @@ void x86_pci_root_bus_resources(int bus, struct list_head *resources)
 		pci_add_resource(resources, >busn);
 
 	list_for_each_entry(root_res, >resources, list) {
+		pci_add_resource(resources, _res->res);
+#if 0
 		struct resource *res;
 		struct resource *root;
 
@@ -61,6 +63,7 @@ void x86_pci_root_bus_resources(int bus, struct list_head *resources)
 		else
 			root = _resource;
 		insert_resource(root, res);
+#endif
 	}
 	return;
 
-- 
1.7.10.4



Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 23:32, Jiang Liu wrote:
> On 2015/11/6 22:45, Lorenzo Pieralisi wrote:
>> On Fri, Nov 06, 2015 at 09:22:46PM +0800, Jiang Liu wrote:
>>> On 2015/11/6 20:40, Tomasz Nowicki wrote:
>>>> On 06.11.2015 12:46, Jiang Liu wrote:
>>>>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>>>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>>>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>>>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>>>>   0x, // Granularity
>>>>>>   0x, // Range Minimum
>>>>>>   0x, // Range Maximum
>>>>>>   0x3EFF, // Translation Offset
>>>>>>   0x0001, // Length
>>>>>>   ,, , TypeStatic)
>>>>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>>>>> According to my understanding, ARM/ARM64 has no concept of IO port
>>>>> address space, so the PCI host bridge will map IO port on PCI side
>>>>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>>>>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>>>>> translation. If that's true, it should use 'TypeTranslation' instead
>>>>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>>>>> support 'TypeTranslation' yet, so we need to find a solution for it.
>>>>
>>>> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
>>>> descriptors and an extra kernel patch to support it.
>>> How about the attached to patch to support TypeTranslation?
>>> It only passes compilation:)
>>
>> Eh, hopefully there are not any ACPI tables out there with that bit
>> set that work _today_ and would not work with the patch attached :)
>>
>> My question is still there: do we want to handle the same problem
>> as ia64 has in a different manner ? Certainly we won't be able
>> to update ia64 platforms ACPI tables, so we would end up with
>> two platforms handling IO resources in different ways unless I am
>> missing something here.
> There are some difference between IA64 and ARM64.
> On IA64, it supports 16M IO address space per PCI domain and 256 PCI
> domains at max. So the system IO address space is 16M * 256 = 4G.
> So it does two level translations to support IO port
> 1) translate PCI bus local IO port address into system global IO port
>address by adding acpi_des->translation_offset.
> 2) translate the 4G system IO port address space into MMIO address.
>IA64 has reserved a 4G space for IO port mapping. This translation
>is done by arch specific method.
> In other word, IA64 needs two level translation, but ACPI only provides
> on (trans_type, trans_offset) pair for encoding, so it's used for step 1).
> 
> For ARM64, I think currently it only needs step 2).
> 
>>
>> BTW, why would we add offset to res->start only if TypeTranslation is
>> clear ? Is not that something we would do just to make things "work" ?
>> That flag has no bearing on the offset, only on the resource type AFAIK.
> It's not a hack, but a way to interpret ACPI spec:)
> 
> With current linux resource management framework, we need to allocate
> both MMIO and IO port address space range for an ACPI resource of type
> 'TypeTranslation'. And struct resource could be either IO port or MMIO,
> not both. So the choice is to keep the resource as IO port, and let
> arch code to build the special MMIO mapping for it. Otherwise it will
> break too many things if we convert the resource as MMIO.
> 
> That said, we need to add translation_offset to convert bus local
> IO port address into system global IO port address if it's type of
> TypeStatic, because ioresource_ioport uses system global IO port
> address.
> 
> For an ACPI resource of type TypeTranslation, system global IO port
> address equals bus local IO port address, and the translation_offset
> is used to translate IO port address into MMIO address, so we shouldn't
> add translation_offset to the IO port resource descriptor.
One note for the TypeTranslation case, the arch code needs to reset
resource_win->offset to zero after setting up the MMIO map. Sample
code as below:
va = ioremap(resource_win->offset + res->start, resource_size(res));
resource_win->offset = 0;

Otherwise it will break pcibios_resource_to_bus() etc.

> 
> Thanks,
> Gerry
> 
>>
>> This without taking into account ARM64 systems shipping wit

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 22:45, Lorenzo Pieralisi wrote:
> On Fri, Nov 06, 2015 at 09:22:46PM +0800, Jiang Liu wrote:
>> On 2015/11/6 20:40, Tomasz Nowicki wrote:
>>> On 06.11.2015 12:46, Jiang Liu wrote:
>>>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>>>   0x, // Granularity
>>>>>   0x, // Range Minimum
>>>>>   0x, // Range Maximum
>>>>>   0x3EFF, // Translation Offset
>>>>>   0x0001, // Length
>>>>>   ,, , TypeStatic)
>>>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>>>> According to my understanding, ARM/ARM64 has no concept of IO port
>>>> address space, so the PCI host bridge will map IO port on PCI side
>>>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>>>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>>>> translation. If that's true, it should use 'TypeTranslation' instead
>>>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>>>> support 'TypeTranslation' yet, so we need to find a solution for it.
>>>
>>> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
>>> descriptors and an extra kernel patch to support it.
>> How about the attached to patch to support TypeTranslation?
>> It only passes compilation:)
> 
> Eh, hopefully there are not any ACPI tables out there with that bit
> set that work _today_ and would not work with the patch attached :)
> 
> My question is still there: do we want to handle the same problem
> as ia64 has in a different manner ? Certainly we won't be able
> to update ia64 platforms ACPI tables, so we would end up with
> two platforms handling IO resources in different ways unless I am
> missing something here.
There are some difference between IA64 and ARM64.
On IA64, it supports 16M IO address space per PCI domain and 256 PCI
domains at max. So the system IO address space is 16M * 256 = 4G.
So it does two level translations to support IO port
1) translate PCI bus local IO port address into system global IO port
   address by adding acpi_des->translation_offset.
2) translate the 4G system IO port address space into MMIO address.
   IA64 has reserved a 4G space for IO port mapping. This translation
   is done by arch specific method.
In other word, IA64 needs two level translation, but ACPI only provides
on (trans_type, trans_offset) pair for encoding, so it's used for step 1).

For ARM64, I think currently it only needs step 2).

> 
> BTW, why would we add offset to res->start only if TypeTranslation is
> clear ? Is not that something we would do just to make things "work" ?
> That flag has no bearing on the offset, only on the resource type AFAIK.
It's not a hack, but a way to interpret ACPI spec:)

With current linux resource management framework, we need to allocate
both MMIO and IO port address space range for an ACPI resource of type
'TypeTranslation'. And struct resource could be either IO port or MMIO,
not both. So the choice is to keep the resource as IO port, and let
arch code to build the special MMIO mapping for it. Otherwise it will
break too many things if we convert the resource as MMIO.

That said, we need to add translation_offset to convert bus local
IO port address into system global IO port address if it's type of
TypeStatic, because ioresource_ioport uses system global IO port
address.

For an ACPI resource of type TypeTranslation, system global IO port
address equals bus local IO port address, and the translation_offset
is used to translate IO port address into MMIO address, so we shouldn't
add translation_offset to the IO port resource descriptor.

Thanks,
Gerry

> 
> This without taking into account ARM64 systems shipping with ACPI
> tables that does not set the TypeTranslation at present.
> 
> On top of that, I noticed that core ACPI code handles Sparse
> Translation (ie _TRS), that should be considered meaningful only if _TTP
> is set (and that's not checked).
Yes, that's a flaw:(

> 
> Thoughts ?
> 
> Thanks,
> Lorenzo
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 20:40, Tomasz Nowicki wrote:
> On 06.11.2015 12:46, Jiang Liu wrote:
>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>   0x, // Granularity
>>>   0x, // Range Minimum
>>>   0x, // Range Maximum
>>>   0x3EFF, // Translation Offset
>>>   0x0001, // Length
>>>   ,, , TypeStatic)
>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>> According to my understanding, ARM/ARM64 has no concept of IO port
>> address space, so the PCI host bridge will map IO port on PCI side
>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>> translation. If that's true, it should use 'TypeTranslation' instead
>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>> support 'TypeTranslation' yet, so we need to find a solution for it.
> 
> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
> descriptors and an extra kernel patch to support it.
How about the attached to patch to support TypeTranslation?
It only passes compilation:)

> 
> Thanks,
> Tomasz
>From 51f5cddd8c4301b731805074ebc3e3a6c7dbaf59 Mon Sep 17 00:00:00 2001
From: Liu Jiang 
Date: Fri, 6 Nov 2015 20:01:59 +0800
Subject: [PATCH]


Signed-off-by: Liu Jiang 
---
 drivers/acpi/resource.c  |   25 +++--
 include/linux/resource_ext.h |7 +++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index cdc5c2599beb..1bd3e21f56fe 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -215,8 +215,29 @@ static bool acpi_decode_space(struct resource_win *win,
 	else if (attr->translation_offset)
 		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
 			 attr->translation_offset);
-	start = attr->minimum + offset;
-	end = attr->maximum + offset;
+	start = attr->minimum;
+	end = attr->maximum;
+
+	/*
+	 * Convert bus local address into system global address if it's an
+	 * IO Port->IO Port or MMIO->MMIO translation.
+	 */
+	switch (addr->resource_type) {
+	case ACPI_MEMORY_RANGE:
+		if (addr->info.mem.translation)
+			win->translation_type = RESOURCE_TRANS_MMIO_TO_IOPORT;
+		else
+			start += offset;
+		break;
+	case ACPI_IO_RANGE:
+		if (addr->info.io.translation)
+			win->translation_type = RESOURCE_TRANS_IOPORT_TO_MMIO;
+		else
+			start += offset;
+		break;
+	default:
+		break;
+	}
 
 	win->offset = offset;
 	res->start = start;
diff --git a/include/linux/resource_ext.h b/include/linux/resource_ext.h
index e2bf63d881d4..f06d358c1f22 100644
--- a/include/linux/resource_ext.h
+++ b/include/linux/resource_ext.h
@@ -22,8 +22,15 @@
 struct resource_win {
 	struct resource res;		/* In master (CPU) address space */
 	resource_size_t offset;		/* Translation offset for bridge */
+	int translation_type;		/* Translation type for bridge */
 };
 
+#define RESOURCE_TRANS_SAME		0x0
+/* Translate from IO port on slave into MMIO on master */
+#define RESOURCE_TRANS_IOPORT_TO_MMIO	0x1
+/* Translate from MMIO on slave into IO port on master */
+#define RESOURCE_TRANS_MMIO_TO_IOPORT	0x2
+
 /*
  * Common resource list management data structure and interfaces to support
  * ACPI, PNP and PCI host bridge etc.
-- 
1.7.10.4



Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 18:37, Tomasz Nowicki wrote:
> On 06.11.2015 09:52, Jiang Liu wrote:
>> On 2015/11/6 2:19, Lorenzo Pieralisi wrote:
>>> On Thu, Nov 05, 2015 at 03:21:34PM +0100, Tomasz Nowicki wrote:
>>>> On 14.10.2015 08:29, Jiang Liu wrote:
>>>
>>> [...]
>>>
>>>>> +static void acpi_pci_root_validate_resources(struct device *dev,
>>>>> + struct list_head *resources,
>>>>> + unsigned long type)
>>>>> +{
>>>>> +LIST_HEAD(list);
>>>>> +struct resource *res1, *res2, *root = NULL;
>>>>> +struct resource_entry *tmp, *entry, *entry2;
>>>>> +
>>>>> +BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>>>>> +root = (type & IORESOURCE_MEM) ? _resource :
>>>>> _resource;
>>>>> +
>>>>> +list_splice_init(resources, );
>>>>> +resource_list_for_each_entry_safe(entry, tmp, ) {
>>>>> +bool free = false;
>>>>> +resource_size_t end;
>>>>> +
>>>>> +res1 = entry->res;
>>>>> +if (!(res1->flags & type))
>>>>> +goto next;
>>>>> +
>>>>> +/* Exclude non-addressable range or non-addressable
>>>>> portion */
>>>>> +end = min(res1->end, root->end);
>>>>> +if (end <= res1->start) {
>>>>> +dev_info(dev, "host bridge window %pR (ignored, not
>>>>> CPU addressable)\n",
>>>>> + res1);
>>>>> +free = true;
>>>>> +goto next;
>>>>> +} else if (res1->end != end) {
>>>>> +dev_info(dev, "host bridge window %pR ([%#llx-%#llx]
>>>>> ignored, not CPU addressable)\n",
>>>>> + res1, (unsigned long long)end + 1,
>>>>> + (unsigned long long)res1->end);
>>>>> +res1->end = end;
>>>>> +}
>>>>> +
>>>>> +resource_list_for_each_entry(entry2, resources) {
>>>>> +res2 = entry2->res;
>>>>> +if (!(res2->flags & type))
>>>>> +continue;
>>>>> +
>>>>> +/*
>>>>> + * I don't like throwing away windows because then
>>>>> + * our resources no longer match the ACPI _CRS, but
>>>>> + * the kernel resource tree doesn't allow overlaps.
>>>>> + */
>>>>> +if (resource_overlaps(res1, res2)) {
>>>>> +res2->start = min(res1->start, res2->start);
>>>>> +res2->end = max(res1->end, res2->end);
>>>>> +dev_info(dev, "host bridge window expanded to %pR;
>>>>> %pR ignored\n",
>>>>> + res2, res1);
>>>>> +free = true;
>>>>> +goto next;
>>>>> +}
>>>>> +}
>>>>> +
>>>>> +next:
>>>>> +resource_list_del(entry);
>>>>> +if (free)
>>>>> +resource_list_free_entry(entry);
>>>>> +else
>>>>> +resource_list_add_tail(entry, resources);
>>>>> +}
>>>>> +}
>>>>> +
>>>>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>>>>> +{
>>>>> +int ret;
>>>>> +struct list_head *list = >resources;
>>>>> +struct acpi_device *device = info->bridge;
>>>>> +struct resource_entry *entry, *tmp;
>>>>> +unsigned long flags;
>>>>> +
>>>>> +flags = IORESOURCE_IO | IORESOURCE_MEM |
>>>>> IORESOURCE_MEM_8AND16BIT;
>>>>> +ret = acpi_dev_get_resources(device, list,
>>>>> + acpi_dev_filter_resource_type_cb,
>>>>> + (void *)flags);
>>>>> +if (ret < 0)
>>>>> +dev_warn(>dev,
>>>>> + "failed to parse _CRS method, error code %d\n", ret);
>>>>>

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 2:19, Lorenzo Pieralisi wrote:
> On Thu, Nov 05, 2015 at 03:21:34PM +0100, Tomasz Nowicki wrote:
>> On 14.10.2015 08:29, Jiang Liu wrote:
> 
> [...]
> 
>>> +static void acpi_pci_root_validate_resources(struct device *dev,
>>> +struct list_head *resources,
>>> +unsigned long type)
>>> +{
>>> +   LIST_HEAD(list);
>>> +   struct resource *res1, *res2, *root = NULL;
>>> +   struct resource_entry *tmp, *entry, *entry2;
>>> +
>>> +   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>>> +   root = (type & IORESOURCE_MEM) ? _resource : _resource;
>>> +
>>> +   list_splice_init(resources, );
>>> +   resource_list_for_each_entry_safe(entry, tmp, ) {
>>> +   bool free = false;
>>> +   resource_size_t end;
>>> +
>>> +   res1 = entry->res;
>>> +   if (!(res1->flags & type))
>>> +   goto next;
>>> +
>>> +   /* Exclude non-addressable range or non-addressable portion */
>>> +   end = min(res1->end, root->end);
>>> +   if (end <= res1->start) {
>>> +   dev_info(dev, "host bridge window %pR (ignored, not CPU 
>>> addressable)\n",
>>> +res1);
>>> +   free = true;
>>> +   goto next;
>>> +   } else if (res1->end != end) {
>>> +   dev_info(dev, "host bridge window %pR ([%#llx-%#llx] 
>>> ignored, not CPU addressable)\n",
>>> +res1, (unsigned long long)end + 1,
>>> +(unsigned long long)res1->end);
>>> +   res1->end = end;
>>> +   }
>>> +
>>> +   resource_list_for_each_entry(entry2, resources) {
>>> +   res2 = entry2->res;
>>> +   if (!(res2->flags & type))
>>> +   continue;
>>> +
>>> +   /*
>>> +* I don't like throwing away windows because then
>>> +* our resources no longer match the ACPI _CRS, but
>>> +* the kernel resource tree doesn't allow overlaps.
>>> +*/
>>> +   if (resource_overlaps(res1, res2)) {
>>> +   res2->start = min(res1->start, res2->start);
>>> +   res2->end = max(res1->end, res2->end);
>>> +   dev_info(dev, "host bridge window expanded to 
>>> %pR; %pR ignored\n",
>>> +res2, res1);
>>> +   free = true;
>>> +   goto next;
>>> +   }
>>> +   }
>>> +
>>> +next:
>>> +   resource_list_del(entry);
>>> +   if (free)
>>> +   resource_list_free_entry(entry);
>>> +   else
>>> +   resource_list_add_tail(entry, resources);
>>> +   }
>>> +}
>>> +
>>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>>> +{
>>> +   int ret;
>>> +   struct list_head *list = >resources;
>>> +   struct acpi_device *device = info->bridge;
>>> +   struct resource_entry *entry, *tmp;
>>> +   unsigned long flags;
>>> +
>>> +   flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_MEM_8AND16BIT;
>>> +   ret = acpi_dev_get_resources(device, list,
>>> +acpi_dev_filter_resource_type_cb,
>>> +(void *)flags);
>>> +   if (ret < 0)
>>> +   dev_warn(>dev,
>>> +"failed to parse _CRS method, error code %d\n", ret);
>>> +   else if (ret == 0)
>>> +   dev_dbg(>dev,
>>> +   "no IO and memory resources present in _CRS\n");
>>> +   else {
>>> +   resource_list_for_each_entry_safe(entry, tmp, list) {
>>> +   if (entry->res->flags & IORESOURCE_DISABLED)
>>> +   resource_list_destroy_entry(entry);
>>> +   else
>>> +   entry->res->name = info-

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 2:19, Lorenzo Pieralisi wrote:
> On Thu, Nov 05, 2015 at 03:21:34PM +0100, Tomasz Nowicki wrote:
>> On 14.10.2015 08:29, Jiang Liu wrote:
> 
> [...]
> 
>>> +static void acpi_pci_root_validate_resources(struct device *dev,
>>> +struct list_head *resources,
>>> +unsigned long type)
>>> +{
>>> +   LIST_HEAD(list);
>>> +   struct resource *res1, *res2, *root = NULL;
>>> +   struct resource_entry *tmp, *entry, *entry2;
>>> +
>>> +   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>>> +   root = (type & IORESOURCE_MEM) ? _resource : _resource;
>>> +
>>> +   list_splice_init(resources, );
>>> +   resource_list_for_each_entry_safe(entry, tmp, ) {
>>> +   bool free = false;
>>> +   resource_size_t end;
>>> +
>>> +   res1 = entry->res;
>>> +   if (!(res1->flags & type))
>>> +   goto next;
>>> +
>>> +   /* Exclude non-addressable range or non-addressable portion */
>>> +   end = min(res1->end, root->end);
>>> +   if (end <= res1->start) {
>>> +   dev_info(dev, "host bridge window %pR (ignored, not CPU 
>>> addressable)\n",
>>> +res1);
>>> +   free = true;
>>> +   goto next;
>>> +   } else if (res1->end != end) {
>>> +   dev_info(dev, "host bridge window %pR ([%#llx-%#llx] 
>>> ignored, not CPU addressable)\n",
>>> +res1, (unsigned long long)end + 1,
>>> +(unsigned long long)res1->end);
>>> +   res1->end = end;
>>> +   }
>>> +
>>> +   resource_list_for_each_entry(entry2, resources) {
>>> +   res2 = entry2->res;
>>> +   if (!(res2->flags & type))
>>> +   continue;
>>> +
>>> +   /*
>>> +* I don't like throwing away windows because then
>>> +* our resources no longer match the ACPI _CRS, but
>>> +* the kernel resource tree doesn't allow overlaps.
>>> +*/
>>> +   if (resource_overlaps(res1, res2)) {
>>> +   res2->start = min(res1->start, res2->start);
>>> +   res2->end = max(res1->end, res2->end);
>>> +   dev_info(dev, "host bridge window expanded to 
>>> %pR; %pR ignored\n",
>>> +res2, res1);
>>> +   free = true;
>>> +   goto next;
>>> +   }
>>> +   }
>>> +
>>> +next:
>>> +   resource_list_del(entry);
>>> +   if (free)
>>> +   resource_list_free_entry(entry);
>>> +   else
>>> +   resource_list_add_tail(entry, resources);
>>> +   }
>>> +}
>>> +
>>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>>> +{
>>> +   int ret;
>>> +   struct list_head *list = >resources;
>>> +   struct acpi_device *device = info->bridge;
>>> +   struct resource_entry *entry, *tmp;
>>> +   unsigned long flags;
>>> +
>>> +   flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_MEM_8AND16BIT;
>>> +   ret = acpi_dev_get_resources(device, list,
>>> +acpi_dev_filter_resource_type_cb,
>>> +(void *)flags);
>>> +   if (ret < 0)
>>> +   dev_warn(>dev,
>>> +"failed to parse _CRS method, error code %d\n", ret);
>>> +   else if (ret == 0)
>>> +   dev_dbg(>dev,
>>> +   "no IO and memory resources present in _CRS\n");
>>> +   else {
>>> +   resource_list_for_each_entry_safe(entry, tmp, list) {
>>> +   if (entry->res->flags & IORESOURCE_DISABLED)
>>> +   resource_list_destroy_entry(entry);
>>> +   else
>>> +   entry->res->name = info-

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 20:40, Tomasz Nowicki wrote:
> On 06.11.2015 12:46, Jiang Liu wrote:
>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>   0x, // Granularity
>>>   0x, // Range Minimum
>>>   0x, // Range Maximum
>>>   0x3EFF, // Translation Offset
>>>   0x0001, // Length
>>>   ,, , TypeStatic)
>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>> According to my understanding, ARM/ARM64 has no concept of IO port
>> address space, so the PCI host bridge will map IO port on PCI side
>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>> translation. If that's true, it should use 'TypeTranslation' instead
>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>> support 'TypeTranslation' yet, so we need to find a solution for it.
> 
> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
> descriptors and an extra kernel patch to support it.
How about the attached to patch to support TypeTranslation?
It only passes compilation:)

> 
> Thanks,
> Tomasz
>From 51f5cddd8c4301b731805074ebc3e3a6c7dbaf59 Mon Sep 17 00:00:00 2001
From: Liu Jiang <jiang@linux.intel.com>
Date: Fri, 6 Nov 2015 20:01:59 +0800
Subject: [PATCH]


Signed-off-by: Liu Jiang <jiang@linux.intel.com>
---
 drivers/acpi/resource.c  |   25 +++--
 include/linux/resource_ext.h |7 +++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index cdc5c2599beb..1bd3e21f56fe 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -215,8 +215,29 @@ static bool acpi_decode_space(struct resource_win *win,
 	else if (attr->translation_offset)
 		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
 			 attr->translation_offset);
-	start = attr->minimum + offset;
-	end = attr->maximum + offset;
+	start = attr->minimum;
+	end = attr->maximum;
+
+	/*
+	 * Convert bus local address into system global address if it's an
+	 * IO Port->IO Port or MMIO->MMIO translation.
+	 */
+	switch (addr->resource_type) {
+	case ACPI_MEMORY_RANGE:
+		if (addr->info.mem.translation)
+			win->translation_type = RESOURCE_TRANS_MMIO_TO_IOPORT;
+		else
+			start += offset;
+		break;
+	case ACPI_IO_RANGE:
+		if (addr->info.io.translation)
+			win->translation_type = RESOURCE_TRANS_IOPORT_TO_MMIO;
+		else
+			start += offset;
+		break;
+	default:
+		break;
+	}
 
 	win->offset = offset;
 	res->start = start;
diff --git a/include/linux/resource_ext.h b/include/linux/resource_ext.h
index e2bf63d881d4..f06d358c1f22 100644
--- a/include/linux/resource_ext.h
+++ b/include/linux/resource_ext.h
@@ -22,8 +22,15 @@
 struct resource_win {
 	struct resource res;		/* In master (CPU) address space */
 	resource_size_t offset;		/* Translation offset for bridge */
+	int translation_type;		/* Translation type for bridge */
 };
 
+#define RESOURCE_TRANS_SAME		0x0
+/* Translate from IO port on slave into MMIO on master */
+#define RESOURCE_TRANS_IOPORT_TO_MMIO	0x1
+/* Translate from MMIO on slave into IO port on master */
+#define RESOURCE_TRANS_MMIO_TO_IOPORT	0x2
+
 /*
  * Common resource list management data structure and interfaces to support
  * ACPI, PNP and PCI host bridge etc.
-- 
1.7.10.4



Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 18:37, Tomasz Nowicki wrote:
> On 06.11.2015 09:52, Jiang Liu wrote:
>> On 2015/11/6 2:19, Lorenzo Pieralisi wrote:
>>> On Thu, Nov 05, 2015 at 03:21:34PM +0100, Tomasz Nowicki wrote:
>>>> On 14.10.2015 08:29, Jiang Liu wrote:
>>>
>>> [...]
>>>
>>>>> +static void acpi_pci_root_validate_resources(struct device *dev,
>>>>> + struct list_head *resources,
>>>>> + unsigned long type)
>>>>> +{
>>>>> +LIST_HEAD(list);
>>>>> +struct resource *res1, *res2, *root = NULL;
>>>>> +struct resource_entry *tmp, *entry, *entry2;
>>>>> +
>>>>> +BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>>>>> +root = (type & IORESOURCE_MEM) ? _resource :
>>>>> _resource;
>>>>> +
>>>>> +list_splice_init(resources, );
>>>>> +resource_list_for_each_entry_safe(entry, tmp, ) {
>>>>> +bool free = false;
>>>>> +resource_size_t end;
>>>>> +
>>>>> +res1 = entry->res;
>>>>> +if (!(res1->flags & type))
>>>>> +goto next;
>>>>> +
>>>>> +/* Exclude non-addressable range or non-addressable
>>>>> portion */
>>>>> +end = min(res1->end, root->end);
>>>>> +if (end <= res1->start) {
>>>>> +dev_info(dev, "host bridge window %pR (ignored, not
>>>>> CPU addressable)\n",
>>>>> + res1);
>>>>> +free = true;
>>>>> +goto next;
>>>>> +} else if (res1->end != end) {
>>>>> +dev_info(dev, "host bridge window %pR ([%#llx-%#llx]
>>>>> ignored, not CPU addressable)\n",
>>>>> + res1, (unsigned long long)end + 1,
>>>>> + (unsigned long long)res1->end);
>>>>> +res1->end = end;
>>>>> +}
>>>>> +
>>>>> +resource_list_for_each_entry(entry2, resources) {
>>>>> +res2 = entry2->res;
>>>>> +if (!(res2->flags & type))
>>>>> +continue;
>>>>> +
>>>>> +/*
>>>>> + * I don't like throwing away windows because then
>>>>> + * our resources no longer match the ACPI _CRS, but
>>>>> + * the kernel resource tree doesn't allow overlaps.
>>>>> + */
>>>>> +if (resource_overlaps(res1, res2)) {
>>>>> +res2->start = min(res1->start, res2->start);
>>>>> +res2->end = max(res1->end, res2->end);
>>>>> +dev_info(dev, "host bridge window expanded to %pR;
>>>>> %pR ignored\n",
>>>>> + res2, res1);
>>>>> +free = true;
>>>>> +goto next;
>>>>> +}
>>>>> +}
>>>>> +
>>>>> +next:
>>>>> +resource_list_del(entry);
>>>>> +if (free)
>>>>> +resource_list_free_entry(entry);
>>>>> +else
>>>>> +resource_list_add_tail(entry, resources);
>>>>> +}
>>>>> +}
>>>>> +
>>>>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>>>>> +{
>>>>> +int ret;
>>>>> +struct list_head *list = >resources;
>>>>> +struct acpi_device *device = info->bridge;
>>>>> +struct resource_entry *entry, *tmp;
>>>>> +unsigned long flags;
>>>>> +
>>>>> +flags = IORESOURCE_IO | IORESOURCE_MEM |
>>>>> IORESOURCE_MEM_8AND16BIT;
>>>>> +ret = acpi_dev_get_resources(device, list,
>>>>> + acpi_dev_filter_resource_type_cb,
>>>>> + (void *)flags);
>>>>> +if (ret < 0)
>>>>> +dev_warn(>dev,
>>>>> + "failed to parse _CRS method, error code %d\n", ret);
>>>>>

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 22:45, Lorenzo Pieralisi wrote:
> On Fri, Nov 06, 2015 at 09:22:46PM +0800, Jiang Liu wrote:
>> On 2015/11/6 20:40, Tomasz Nowicki wrote:
>>> On 06.11.2015 12:46, Jiang Liu wrote:
>>>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>>>   0x, // Granularity
>>>>>   0x, // Range Minimum
>>>>>   0x, // Range Maximum
>>>>>   0x3EFF, // Translation Offset
>>>>>   0x0001, // Length
>>>>>   ,, , TypeStatic)
>>>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>>>> According to my understanding, ARM/ARM64 has no concept of IO port
>>>> address space, so the PCI host bridge will map IO port on PCI side
>>>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>>>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>>>> translation. If that's true, it should use 'TypeTranslation' instead
>>>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>>>> support 'TypeTranslation' yet, so we need to find a solution for it.
>>>
>>> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
>>> descriptors and an extra kernel patch to support it.
>> How about the attached to patch to support TypeTranslation?
>> It only passes compilation:)
> 
> Eh, hopefully there are not any ACPI tables out there with that bit
> set that work _today_ and would not work with the patch attached :)
> 
> My question is still there: do we want to handle the same problem
> as ia64 has in a different manner ? Certainly we won't be able
> to update ia64 platforms ACPI tables, so we would end up with
> two platforms handling IO resources in different ways unless I am
> missing something here.
There are some difference between IA64 and ARM64.
On IA64, it supports 16M IO address space per PCI domain and 256 PCI
domains at max. So the system IO address space is 16M * 256 = 4G.
So it does two level translations to support IO port
1) translate PCI bus local IO port address into system global IO port
   address by adding acpi_des->translation_offset.
2) translate the 4G system IO port address space into MMIO address.
   IA64 has reserved a 4G space for IO port mapping. This translation
   is done by arch specific method.
In other word, IA64 needs two level translation, but ACPI only provides
on (trans_type, trans_offset) pair for encoding, so it's used for step 1).

For ARM64, I think currently it only needs step 2).

> 
> BTW, why would we add offset to res->start only if TypeTranslation is
> clear ? Is not that something we would do just to make things "work" ?
> That flag has no bearing on the offset, only on the resource type AFAIK.
It's not a hack, but a way to interpret ACPI spec:)

With current linux resource management framework, we need to allocate
both MMIO and IO port address space range for an ACPI resource of type
'TypeTranslation'. And struct resource could be either IO port or MMIO,
not both. So the choice is to keep the resource as IO port, and let
arch code to build the special MMIO mapping for it. Otherwise it will
break too many things if we convert the resource as MMIO.

That said, we need to add translation_offset to convert bus local
IO port address into system global IO port address if it's type of
TypeStatic, because ioresource_ioport uses system global IO port
address.

For an ACPI resource of type TypeTranslation, system global IO port
address equals bus local IO port address, and the translation_offset
is used to translate IO port address into MMIO address, so we shouldn't
add translation_offset to the IO port resource descriptor.

Thanks,
Gerry

> 
> This without taking into account ARM64 systems shipping with ACPI
> tables that does not set the TypeTranslation at present.
> 
> On top of that, I noticed that core ACPI code handles Sparse
> Translation (ie _TRS), that should be considered meaningful only if _TTP
> is set (and that's not checked).
Yes, that's a flaw:(

> 
> Thoughts ?
> 
> Thanks,
> Lorenzo
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-06 Thread Jiang Liu
On 2015/11/6 23:32, Jiang Liu wrote:
> On 2015/11/6 22:45, Lorenzo Pieralisi wrote:
>> On Fri, Nov 06, 2015 at 09:22:46PM +0800, Jiang Liu wrote:
>>> On 2015/11/6 20:40, Tomasz Nowicki wrote:
>>>> On 06.11.2015 12:46, Jiang Liu wrote:
>>>>> On 2015/11/6 18:37, Tomasz Nowicki wrote:
>>>>>> On 06.11.2015 09:52, Jiang Liu wrote:
>>>>>> Sure, ARM64 (0-16M IO space) QEMU example:
>>>>>> DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
>>>>>>   0x, // Granularity
>>>>>>   0x, // Range Minimum
>>>>>>   0x, // Range Maximum
>>>>>>   0x3EFF, // Translation Offset
>>>>>>   0x0001, // Length
>>>>>>   ,, , TypeStatic)
>>>>> The above DWordIO resource descriptor doesn't confirm to the ACPI spec.
>>>>> According to my understanding, ARM/ARM64 has no concept of IO port
>>>>> address space, so the PCI host bridge will map IO port on PCI side
>>>>> onto MMIO on host side. In other words, PCI host bridge on ARM64
>>>>> implement a IO Port->MMIO translation instead of a IO Port->IO Port
>>>>> translation. If that's true, it should use 'TypeTranslation' instead
>>>>> of 'TypeStatic'. And kernel ACPI resource parsing interface doesn't
>>>>> support 'TypeTranslation' yet, so we need to find a solution for it.
>>>>
>>>> I think you are right, we need TypeTranslation flag for ARM64 DWordIO
>>>> descriptors and an extra kernel patch to support it.
>>> How about the attached to patch to support TypeTranslation?
>>> It only passes compilation:)
>>
>> Eh, hopefully there are not any ACPI tables out there with that bit
>> set that work _today_ and would not work with the patch attached :)
>>
>> My question is still there: do we want to handle the same problem
>> as ia64 has in a different manner ? Certainly we won't be able
>> to update ia64 platforms ACPI tables, so we would end up with
>> two platforms handling IO resources in different ways unless I am
>> missing something here.
> There are some difference between IA64 and ARM64.
> On IA64, it supports 16M IO address space per PCI domain and 256 PCI
> domains at max. So the system IO address space is 16M * 256 = 4G.
> So it does two level translations to support IO port
> 1) translate PCI bus local IO port address into system global IO port
>address by adding acpi_des->translation_offset.
> 2) translate the 4G system IO port address space into MMIO address.
>IA64 has reserved a 4G space for IO port mapping. This translation
>is done by arch specific method.
> In other word, IA64 needs two level translation, but ACPI only provides
> on (trans_type, trans_offset) pair for encoding, so it's used for step 1).
> 
> For ARM64, I think currently it only needs step 2).
> 
>>
>> BTW, why would we add offset to res->start only if TypeTranslation is
>> clear ? Is not that something we would do just to make things "work" ?
>> That flag has no bearing on the offset, only on the resource type AFAIK.
> It's not a hack, but a way to interpret ACPI spec:)
> 
> With current linux resource management framework, we need to allocate
> both MMIO and IO port address space range for an ACPI resource of type
> 'TypeTranslation'. And struct resource could be either IO port or MMIO,
> not both. So the choice is to keep the resource as IO port, and let
> arch code to build the special MMIO mapping for it. Otherwise it will
> break too many things if we convert the resource as MMIO.
> 
> That said, we need to add translation_offset to convert bus local
> IO port address into system global IO port address if it's type of
> TypeStatic, because ioresource_ioport uses system global IO port
> address.
> 
> For an ACPI resource of type TypeTranslation, system global IO port
> address equals bus local IO port address, and the translation_offset
> is used to translate IO port address into MMIO address, so we shouldn't
> add translation_offset to the IO port resource descriptor.
One note for the TypeTranslation case, the arch code needs to reset
resource_win->offset to zero after setting up the MMIO map. Sample
code as below:
va = ioremap(resource_win->offset + res->start, resource_size(res));
resource_win->offset = 0;

Otherwise it will break pcibios_resource_to_bus() etc.

> 
> Thanks,
> Gerry
> 
>>
>> This without taking into account ARM64 systems shipping wit

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-05 Thread Jiang Liu
On 2015/11/6 2:19, Lorenzo Pieralisi wrote:
> On Thu, Nov 05, 2015 at 03:21:34PM +0100, Tomasz Nowicki wrote:
>> On 14.10.2015 08:29, Jiang Liu wrote:
> 
> [...]
> 
>>> +static void acpi_pci_root_validate_resources(struct device *dev,
>>> +struct list_head *resources,
>>> +unsigned long type)
>>> +{
>>> +   LIST_HEAD(list);
>>> +   struct resource *res1, *res2, *root = NULL;
>>> +   struct resource_entry *tmp, *entry, *entry2;
>>> +
>>> +   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>>> +   root = (type & IORESOURCE_MEM) ? _resource : _resource;
>>> +
>>> +   list_splice_init(resources, );
>>> +   resource_list_for_each_entry_safe(entry, tmp, ) {
>>> +   bool free = false;
>>> +   resource_size_t end;
>>> +
>>> +   res1 = entry->res;
>>> +   if (!(res1->flags & type))
>>> +   goto next;
>>> +
>>> +   /* Exclude non-addressable range or non-addressable portion */
>>> +   end = min(res1->end, root->end);
>>> +   if (end <= res1->start) {
>>> +   dev_info(dev, "host bridge window %pR (ignored, not CPU 
>>> addressable)\n",
>>> +res1);
>>> +   free = true;
>>> +   goto next;
>>> +   } else if (res1->end != end) {
>>> +   dev_info(dev, "host bridge window %pR ([%#llx-%#llx] 
>>> ignored, not CPU addressable)\n",
>>> +res1, (unsigned long long)end + 1,
>>> +(unsigned long long)res1->end);
>>> +   res1->end = end;
>>> +   }
>>> +
>>> +   resource_list_for_each_entry(entry2, resources) {
>>> +   res2 = entry2->res;
>>> +   if (!(res2->flags & type))
>>> +   continue;
>>> +
>>> +   /*
>>> +* I don't like throwing away windows because then
>>> +* our resources no longer match the ACPI _CRS, but
>>> +* the kernel resource tree doesn't allow overlaps.
>>> +*/
>>> +   if (resource_overlaps(res1, res2)) {
>>> +   res2->start = min(res1->start, res2->start);
>>> +   res2->end = max(res1->end, res2->end);
>>> +   dev_info(dev, "host bridge window expanded to 
>>> %pR; %pR ignored\n",
>>> +res2, res1);
>>> +   free = true;
>>> +   goto next;
>>> +   }
>>> +   }
>>> +
>>> +next:
>>> +   resource_list_del(entry);
>>> +   if (free)
>>> +   resource_list_free_entry(entry);
>>> +   else
>>> +   resource_list_add_tail(entry, resources);
>>> +   }
>>> +}
>>> +
>>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>>> +{
>>> +   int ret;
>>> +   struct list_head *list = >resources;
>>> +   struct acpi_device *device = info->bridge;
>>> +   struct resource_entry *entry, *tmp;
>>> +   unsigned long flags;
>>> +
>>> +   flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_MEM_8AND16BIT;
>>> +   ret = acpi_dev_get_resources(device, list,
>>> +acpi_dev_filter_resource_type_cb,
>>> +(void *)flags);
>>> +   if (ret < 0)
>>> +   dev_warn(>dev,
>>> +"failed to parse _CRS method, error code %d\n", ret);
>>> +   else if (ret == 0)
>>> +   dev_dbg(>dev,
>>> +   "no IO and memory resources present in _CRS\n");
>>> +   else {
>>> +   resource_list_for_each_entry_safe(entry, tmp, list) {
>>> +   if (entry->res->flags & IORESOURCE_DISABLED)
>>> +   resource_list_destroy_entry(entry);
>>> +   else
>>> +   entry->res->name = info-

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-05 Thread Jiang Liu
On 2015/11/5 22:21, Tomasz Nowicki wrote:
> On 14.10.2015 08:29, Jiang Liu wrote:
>> Introduce common interface acpi_pci_root_create() and related data
>> structures to create PCI root bus for ACPI PCI host bridges. It will
>> be used to kill duplicated arch specific code for IA64 and x86. It may
>> also help ARM64 in future.
>>
>> Reviewed-by: Lorenzo Pieralisi 
>> Tested-by: Tony Luck 
>> Signed-off-by: Jiang Liu 
>> Signed-off-by: Liu Jiang 
>> ---
>>   drivers/acpi/pci_root.c  |  204
>> ++
>>   include/linux/pci-acpi.h |   24 ++
>>   2 files changed, 228 insertions(+)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 393706a5261b..850d7bf0c873 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -652,6 +652,210 @@ static void acpi_pci_root_remove(struct
>> acpi_device *device)
>>   kfree(root);
>>   }
>>
>> +/*
>> + * Following code to support acpi_pci_root_create() is copied from
>> + * arch/x86/pci/acpi.c and modified so it could be reused by x86, IA64
>> + * and ARM64.
>> + */
>> +static void acpi_pci_root_validate_resources(struct device *dev,
>> + struct list_head *resources,
>> + unsigned long type)
>> +{
>> +LIST_HEAD(list);
>> +struct resource *res1, *res2, *root = NULL;
>> +struct resource_entry *tmp, *entry, *entry2;
>> +
>> +BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>> +root = (type & IORESOURCE_MEM) ? _resource : _resource;
>> +
>> +list_splice_init(resources, );
>> +resource_list_for_each_entry_safe(entry, tmp, ) {
>> +bool free = false;
>> +resource_size_t end;
>> +
>> +res1 = entry->res;
>> +if (!(res1->flags & type))
>> +goto next;
>> +
>> +/* Exclude non-addressable range or non-addressable portion */
>> +end = min(res1->end, root->end);
>> +if (end <= res1->start) {
>> +dev_info(dev, "host bridge window %pR (ignored, not CPU
>> addressable)\n",
>> + res1);
>> +free = true;
>> +goto next;
>> +} else if (res1->end != end) {
>> +dev_info(dev, "host bridge window %pR ([%#llx-%#llx]
>> ignored, not CPU addressable)\n",
>> + res1, (unsigned long long)end + 1,
>> + (unsigned long long)res1->end);
>> +res1->end = end;
>> +}
>> +
>> +resource_list_for_each_entry(entry2, resources) {
>> +res2 = entry2->res;
>> +if (!(res2->flags & type))
>> +continue;
>> +
>> +/*
>> + * I don't like throwing away windows because then
>> + * our resources no longer match the ACPI _CRS, but
>> + * the kernel resource tree doesn't allow overlaps.
>> + */
>> +if (resource_overlaps(res1, res2)) {
>> +res2->start = min(res1->start, res2->start);
>> +res2->end = max(res1->end, res2->end);
>> +dev_info(dev, "host bridge window expanded to %pR;
>> %pR ignored\n",
>> + res2, res1);
>> +free = true;
>> +goto next;
>> +}
>> +}
>> +
>> +next:
>> +resource_list_del(entry);
>> +if (free)
>> +resource_list_free_entry(entry);
>> +else
>> +resource_list_add_tail(entry, resources);
>> +}
>> +}
>> +
>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>> +{
>> +int ret;
>> +struct list_head *list = >resources;
>> +struct acpi_device *device = info->bridge;
>> +struct resource_entry *entry, *tmp;
>> +unsigned long flags;
>> +
>> +flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_MEM_8AND16BIT;
>> +ret = acpi_dev_get_resources(device, list,
>> + acpi_dev_filter_resource_type_cb,
>> + (void *)flags);
>> +if (ret < 0)
>> +dev_warn(>dev,
>> + "failed to parse _CRS method, error code %d\n", ret);
>> +else if (ret == 0)
>> +dev_dbg(>dev,
>> +"

Re: [PATCH v2 02/11] fsl-mc: msi: Added FSL-MC-specific member to the msi_desc's union

2015-11-05 Thread Jiang Liu


On 2015/10/31 3:43, J. German Rivera wrote:
> FSL-MC is a bus type different from PCI and platform, so it needs
> its own member in the msi_desc's union.
> 
> Signed-off-by: J. German Rivera 
> ---
> Changes in v2:
> - Addressed comment from Jiang Liu
>   * Added a dedicated structure for FSL-MC in struct msi_desc
> 
>  include/linux/msi.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index f71a25e..152e51a 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -33,6 +33,14 @@ struct platform_msi_desc {
>  };
> 
>  /**
> + * fsl_mc_msi_desc - FSL-MC device specific msi descriptor data
> + * @msi_index:   The index of the MSI descriptor
> + */
> +struct fsl_mc_msi_desc {
> + u16 msi_index;
> +};
> +
> +/**
>   * struct msi_desc - Descriptor structure for MSI based interrupts
>   * @list:List head for management
>   * @irq: The base interrupt number
> @@ -87,6 +95,7 @@ struct msi_desc {
>* tree wide cleanup.
>*/
>   struct platform_msi_desc platform;
> + struct fsl_mc_msi_desc fsl_mc;
>   };
>  };

Reviewed-by: Jiang Liu 

> 
> --
> 2.3.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] irqdomain: Added domain bus token DOMAIN_BUS_FSL_MC_MSI

2015-11-05 Thread Jiang Liu

On 2015/10/31 3:43, J. German Rivera wrote:
> Since an FSL-MC bus is a new bus type that is neither PCI nor
> PLATFORM, we need a new domain bus token to disambiguate the
> IRQ domain for FSL-MC MSIs.
> 
> Signed-off-by: J. German Rivera 
> ---
> Changes in v2: none
> 
>  include/linux/irqdomain.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
> index d5e5c5b..c0cb5d1 100644
> --- a/include/linux/irqdomain.h
> +++ b/include/linux/irqdomain.h
> @@ -73,6 +73,7 @@ enum irq_domain_bus_token {
>   DOMAIN_BUS_PCI_MSI,
>   DOMAIN_BUS_PLATFORM_MSI,
>   DOMAIN_BUS_NEXUS,
> + DOMAIN_BUS_FSL_MC_MSI,
>  };

Reviewed-by: Jiang Liu 

> 
>  /**
> --
> 2.3.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel

2015-11-05 Thread Jiang Liu
On 2015/11/5 20:53, Tomasz Nowicki wrote:
> On 02.11.2015 16:27, Tomasz Nowicki wrote:
>> On 08.07.2015 09:26, Jiang Liu wrote:
>>> Zoltan Boszormenyi reported this regression:
>>>"There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>>> 1565:230e) network chip on the mainboard. After the r8169 driver
>>> loaded
>>> the IRQs in the machine went berserk. Keyboard keypressed arrived
>>> with
>>> considerable latency and duplicated, so no real work was possible.
>>> The machine responded to the power button but didn't actually power
>>> down. It just stuck at the powering down message. I had to press the
>>> power button for 4 seconds to power it down.
>>>
>>> The computer is a POS machine with a big battery inside. Because
>>> of this,
>>> either ACPI or the Realtek chip kept the bad state and after
>>> rebooting,
>>> the network chip didn't even show up in lspci. Not even the PXE ROM
>>> announced itself during boot. I had to disconnect the battery to
>>> beat
>>> some sense back to the computer.
>>>
>>> The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final.
>>> 3.18.16 was
>>> good."
>>>
>>> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use
>>> common
>>> ACPI resource interfaces to simplify implementation"). Since commit
>>> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI
>>> resources by
>>> first converting an ACPI resource to a 'struct resource' structure and
>>> then applying checks against the converted resource structure. The
>>> 'start'
>>> and 'end' fields in 'struct resource' are defined to be type of
>>> resource_size_t, which may be 32 bits or 64 bits depending on
>>> CONFIG_PHYS_ADDR_T_64BIT.
>>>
>>> This may cause incorrect resource validation results with 32-bit kernels
>>> because 64-bit ACPI resource descriptors may get truncated when
>>> converting
>>> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
>>> affects PCI resource allocation subsystem and makes some PCI devices and
>>> the system behave abnormally due to incorrect resource assignment.
>>>
>>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
>>> descriptors with address/offset above 4G when running in 32-bit mode.
>>>
>>> With the fix applied, the behavior of the machine was restored to how
>>> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
>>> and lspci -vvxxx shows that everything is at the same memory window as
>>> they were with 3.18.16.
>>>
>>> Reported-and-Tested-by: Boszormenyi Zoltan 
>>> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
>>> interfaces to simplify implementation")
>>> Signed-off-by: Jiang Liu 
>>> Cc: sta...@vger.kernel.org # 4.0
>>> ---
>>>   drivers/acpi/resource.c |   24 +++-
>>>   1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>> index 10561ce16ed1..e8d281739cbc 100644
>>> --- a/drivers/acpi/resource.c
>>> +++ b/drivers/acpi/resource.c
>>> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win
>>> *win,
>>>   u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 :
>>> ACPI_DECODE_16;
>>>   bool wp = addr->info.mem.write_protect;
>>>   u64 len = attr->address_length;
>>> +u64 start, end, offset = 0;
>>>   struct resource *res = >res;
>>>
>>>   /*
>>> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win
>>> *win,
>>>   pr_debug("ACPI: Invalid address space min_addr_fix %d,
>>> max_addr_fix %d, len %llx\n",
>>>addr->min_address_fixed, addr->max_address_fixed, len);
>>>
>>> -res->start = attr->minimum;
>>> -res->end = attr->maximum;
>>> -
>>>   /*
>>>* For bridges that translate addresses across the bridge,
>>>* translation_offset is the offset that must be added to the
>>> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct
>>> resource_win *win,
>>>* primary side. Non-bridge devices must list 0 for all Address
>>>

Re: [PATCH v2 02/11] fsl-mc: msi: Added FSL-MC-specific member to the msi_desc's union

2015-11-05 Thread Jiang Liu


On 2015/10/31 3:43, J. German Rivera wrote:
> FSL-MC is a bus type different from PCI and platform, so it needs
> its own member in the msi_desc's union.
> 
> Signed-off-by: J. German Rivera <german.riv...@freescale.com>
> ---
> Changes in v2:
> - Addressed comment from Jiang Liu
>   * Added a dedicated structure for FSL-MC in struct msi_desc
> 
>  include/linux/msi.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index f71a25e..152e51a 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -33,6 +33,14 @@ struct platform_msi_desc {
>  };
> 
>  /**
> + * fsl_mc_msi_desc - FSL-MC device specific msi descriptor data
> + * @msi_index:   The index of the MSI descriptor
> + */
> +struct fsl_mc_msi_desc {
> + u16 msi_index;
> +};
> +
> +/**
>   * struct msi_desc - Descriptor structure for MSI based interrupts
>   * @list:List head for management
>   * @irq: The base interrupt number
> @@ -87,6 +95,7 @@ struct msi_desc {
>* tree wide cleanup.
>*/
>   struct platform_msi_desc platform;
> + struct fsl_mc_msi_desc fsl_mc;
>   };
>  };

Reviewed-by: Jiang Liu <jiang@linux.intel.com>

> 
> --
> 2.3.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/11] irqdomain: Added domain bus token DOMAIN_BUS_FSL_MC_MSI

2015-11-05 Thread Jiang Liu

On 2015/10/31 3:43, J. German Rivera wrote:
> Since an FSL-MC bus is a new bus type that is neither PCI nor
> PLATFORM, we need a new domain bus token to disambiguate the
> IRQ domain for FSL-MC MSIs.
> 
> Signed-off-by: J. German Rivera <german.riv...@freescale.com>
> ---
> Changes in v2: none
> 
>  include/linux/irqdomain.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
> index d5e5c5b..c0cb5d1 100644
> --- a/include/linux/irqdomain.h
> +++ b/include/linux/irqdomain.h
> @@ -73,6 +73,7 @@ enum irq_domain_bus_token {
>   DOMAIN_BUS_PCI_MSI,
>   DOMAIN_BUS_PLATFORM_MSI,
>   DOMAIN_BUS_NEXUS,
> + DOMAIN_BUS_FSL_MC_MSI,
>  };

Reviewed-by: Jiang Liu <jiang@linux.intel.com>

> 
>  /**
> --
> 2.3.3
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-05 Thread Jiang Liu
On 2015/11/6 2:19, Lorenzo Pieralisi wrote:
> On Thu, Nov 05, 2015 at 03:21:34PM +0100, Tomasz Nowicki wrote:
>> On 14.10.2015 08:29, Jiang Liu wrote:
> 
> [...]
> 
>>> +static void acpi_pci_root_validate_resources(struct device *dev,
>>> +struct list_head *resources,
>>> +unsigned long type)
>>> +{
>>> +   LIST_HEAD(list);
>>> +   struct resource *res1, *res2, *root = NULL;
>>> +   struct resource_entry *tmp, *entry, *entry2;
>>> +
>>> +   BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>>> +   root = (type & IORESOURCE_MEM) ? _resource : _resource;
>>> +
>>> +   list_splice_init(resources, );
>>> +   resource_list_for_each_entry_safe(entry, tmp, ) {
>>> +   bool free = false;
>>> +   resource_size_t end;
>>> +
>>> +   res1 = entry->res;
>>> +   if (!(res1->flags & type))
>>> +   goto next;
>>> +
>>> +   /* Exclude non-addressable range or non-addressable portion */
>>> +   end = min(res1->end, root->end);
>>> +   if (end <= res1->start) {
>>> +   dev_info(dev, "host bridge window %pR (ignored, not CPU 
>>> addressable)\n",
>>> +res1);
>>> +   free = true;
>>> +   goto next;
>>> +   } else if (res1->end != end) {
>>> +   dev_info(dev, "host bridge window %pR ([%#llx-%#llx] 
>>> ignored, not CPU addressable)\n",
>>> +res1, (unsigned long long)end + 1,
>>> +(unsigned long long)res1->end);
>>> +   res1->end = end;
>>> +   }
>>> +
>>> +   resource_list_for_each_entry(entry2, resources) {
>>> +   res2 = entry2->res;
>>> +   if (!(res2->flags & type))
>>> +   continue;
>>> +
>>> +   /*
>>> +* I don't like throwing away windows because then
>>> +* our resources no longer match the ACPI _CRS, but
>>> +* the kernel resource tree doesn't allow overlaps.
>>> +*/
>>> +   if (resource_overlaps(res1, res2)) {
>>> +   res2->start = min(res1->start, res2->start);
>>> +   res2->end = max(res1->end, res2->end);
>>> +   dev_info(dev, "host bridge window expanded to 
>>> %pR; %pR ignored\n",
>>> +res2, res1);
>>> +   free = true;
>>> +   goto next;
>>> +   }
>>> +   }
>>> +
>>> +next:
>>> +   resource_list_del(entry);
>>> +   if (free)
>>> +   resource_list_free_entry(entry);
>>> +   else
>>> +   resource_list_add_tail(entry, resources);
>>> +   }
>>> +}
>>> +
>>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>>> +{
>>> +   int ret;
>>> +   struct list_head *list = >resources;
>>> +   struct acpi_device *device = info->bridge;
>>> +   struct resource_entry *entry, *tmp;
>>> +   unsigned long flags;
>>> +
>>> +   flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_MEM_8AND16BIT;
>>> +   ret = acpi_dev_get_resources(device, list,
>>> +acpi_dev_filter_resource_type_cb,
>>> +(void *)flags);
>>> +   if (ret < 0)
>>> +   dev_warn(>dev,
>>> +"failed to parse _CRS method, error code %d\n", ret);
>>> +   else if (ret == 0)
>>> +   dev_dbg(>dev,
>>> +   "no IO and memory resources present in _CRS\n");
>>> +   else {
>>> +   resource_list_for_each_entry_safe(entry, tmp, list) {
>>> +   if (entry->res->flags & IORESOURCE_DISABLED)
>>> +   resource_list_destroy_entry(entry);
>>> +   else
>>> +   entry->res->name = info-

Re: [Patch v7 4/7] PCI/ACPI: Add interface acpi_pci_root_create()

2015-11-05 Thread Jiang Liu
On 2015/11/5 22:21, Tomasz Nowicki wrote:
> On 14.10.2015 08:29, Jiang Liu wrote:
>> Introduce common interface acpi_pci_root_create() and related data
>> structures to create PCI root bus for ACPI PCI host bridges. It will
>> be used to kill duplicated arch specific code for IA64 and x86. It may
>> also help ARM64 in future.
>>
>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>
>> Tested-by: Tony Luck <tony.l...@intel.com>
>> Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>> Signed-off-by: Liu Jiang <jiang@linux.intel.com>
>> ---
>>   drivers/acpi/pci_root.c  |  204
>> ++
>>   include/linux/pci-acpi.h |   24 ++
>>   2 files changed, 228 insertions(+)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 393706a5261b..850d7bf0c873 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -652,6 +652,210 @@ static void acpi_pci_root_remove(struct
>> acpi_device *device)
>>   kfree(root);
>>   }
>>
>> +/*
>> + * Following code to support acpi_pci_root_create() is copied from
>> + * arch/x86/pci/acpi.c and modified so it could be reused by x86, IA64
>> + * and ARM64.
>> + */
>> +static void acpi_pci_root_validate_resources(struct device *dev,
>> + struct list_head *resources,
>> + unsigned long type)
>> +{
>> +LIST_HEAD(list);
>> +struct resource *res1, *res2, *root = NULL;
>> +struct resource_entry *tmp, *entry, *entry2;
>> +
>> +BUG_ON((type & (IORESOURCE_MEM | IORESOURCE_IO)) == 0);
>> +root = (type & IORESOURCE_MEM) ? _resource : _resource;
>> +
>> +list_splice_init(resources, );
>> +resource_list_for_each_entry_safe(entry, tmp, ) {
>> +bool free = false;
>> +resource_size_t end;
>> +
>> +res1 = entry->res;
>> +if (!(res1->flags & type))
>> +goto next;
>> +
>> +/* Exclude non-addressable range or non-addressable portion */
>> +end = min(res1->end, root->end);
>> +if (end <= res1->start) {
>> +dev_info(dev, "host bridge window %pR (ignored, not CPU
>> addressable)\n",
>> + res1);
>> +free = true;
>> +goto next;
>> +} else if (res1->end != end) {
>> +dev_info(dev, "host bridge window %pR ([%#llx-%#llx]
>> ignored, not CPU addressable)\n",
>> + res1, (unsigned long long)end + 1,
>> + (unsigned long long)res1->end);
>> +res1->end = end;
>> +}
>> +
>> +resource_list_for_each_entry(entry2, resources) {
>> +res2 = entry2->res;
>> +if (!(res2->flags & type))
>> +continue;
>> +
>> +/*
>> + * I don't like throwing away windows because then
>> + * our resources no longer match the ACPI _CRS, but
>> + * the kernel resource tree doesn't allow overlaps.
>> + */
>> +if (resource_overlaps(res1, res2)) {
>> +res2->start = min(res1->start, res2->start);
>> +res2->end = max(res1->end, res2->end);
>> +dev_info(dev, "host bridge window expanded to %pR;
>> %pR ignored\n",
>> + res2, res1);
>> +free = true;
>> +goto next;
>> +}
>> +}
>> +
>> +next:
>> +resource_list_del(entry);
>> +if (free)
>> +resource_list_free_entry(entry);
>> +else
>> +resource_list_add_tail(entry, resources);
>> +}
>> +}
>> +
>> +int acpi_pci_probe_root_resources(struct acpi_pci_root_info *info)
>> +{
>> +int ret;
>> +struct list_head *list = >resources;
>> +struct acpi_device *device = info->bridge;
>> +struct resource_entry *entry, *tmp;
>> +unsigned long flags;
>> +
>> +flags = IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_MEM_8AND16BIT;
>> +ret = acpi_dev_get_resources(device, list,
>> + acpi_dev_filter_resource_type_cb,
>> + (void *)flags);
>> +if (ret < 0)
>> +dev_warn(>dev,
>> + "failed t

Re: [Bugfix v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel

2015-11-05 Thread Jiang Liu
On 2015/11/5 20:53, Tomasz Nowicki wrote:
> On 02.11.2015 16:27, Tomasz Nowicki wrote:
>> On 08.07.2015 09:26, Jiang Liu wrote:
>>> Zoltan Boszormenyi reported this regression:
>>>"There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>>> 1565:230e) network chip on the mainboard. After the r8169 driver
>>> loaded
>>> the IRQs in the machine went berserk. Keyboard keypressed arrived
>>> with
>>> considerable latency and duplicated, so no real work was possible.
>>> The machine responded to the power button but didn't actually power
>>> down. It just stuck at the powering down message. I had to press the
>>> power button for 4 seconds to power it down.
>>>
>>> The computer is a POS machine with a big battery inside. Because
>>> of this,
>>> either ACPI or the Realtek chip kept the bad state and after
>>> rebooting,
>>> the network chip didn't even show up in lspci. Not even the PXE ROM
>>> announced itself during boot. I had to disconnect the battery to
>>> beat
>>> some sense back to the computer.
>>>
>>> The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final.
>>> 3.18.16 was
>>> good."
>>>
>>> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use
>>> common
>>> ACPI resource interfaces to simplify implementation"). Since commit
>>> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI
>>> resources by
>>> first converting an ACPI resource to a 'struct resource' structure and
>>> then applying checks against the converted resource structure. The
>>> 'start'
>>> and 'end' fields in 'struct resource' are defined to be type of
>>> resource_size_t, which may be 32 bits or 64 bits depending on
>>> CONFIG_PHYS_ADDR_T_64BIT.
>>>
>>> This may cause incorrect resource validation results with 32-bit kernels
>>> because 64-bit ACPI resource descriptors may get truncated when
>>> converting
>>> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
>>> affects PCI resource allocation subsystem and makes some PCI devices and
>>> the system behave abnormally due to incorrect resource assignment.
>>>
>>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
>>> descriptors with address/offset above 4G when running in 32-bit mode.
>>>
>>> With the fix applied, the behavior of the machine was restored to how
>>> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
>>> and lspci -vvxxx shows that everything is at the same memory window as
>>> they were with 3.18.16.
>>>
>>> Reported-and-Tested-by: Boszormenyi Zoltan <zbos...@pr.hu>
>>> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
>>> interfaces to simplify implementation")
>>> Signed-off-by: Jiang Liu <jiang@linux.intel.com>
>>> Cc: sta...@vger.kernel.org # 4.0
>>> ---
>>>   drivers/acpi/resource.c |   24 +++-
>>>   1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>> index 10561ce16ed1..e8d281739cbc 100644
>>> --- a/drivers/acpi/resource.c
>>> +++ b/drivers/acpi/resource.c
>>> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win
>>> *win,
>>>   u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 :
>>> ACPI_DECODE_16;
>>>   bool wp = addr->info.mem.write_protect;
>>>   u64 len = attr->address_length;
>>> +u64 start, end, offset = 0;
>>>   struct resource *res = >res;
>>>
>>>   /*
>>> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win
>>> *win,
>>>   pr_debug("ACPI: Invalid address space min_addr_fix %d,
>>> max_addr_fix %d, len %llx\n",
>>>addr->min_address_fixed, addr->max_address_fixed, len);
>>>
>>> -res->start = attr->minimum;
>>> -res->end = attr->maximum;
>>> -
>>>   /*
>>>* For bridges that translate addresses across the bridge,
>>>* translation_offset is the offset that must be added to the
>>> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct
>>> resource_win *win,
>>>* primary side. Non-bridge devi

Re: [RFC PATCHv3 3/4] x86/pci: Initial commit for new VMD device driver

2015-11-04 Thread Jiang Liu


On 2015/11/4 23:14, Thomas Gleixner wrote:
> On Wed, 4 Nov 2015, Keith Busch wrote:
> 
>> On Tue, Nov 03, 2015 at 12:42:02PM +0100, Thomas Gleixner wrote:
>>> On Tue, 3 Nov 2015, Keith Busch wrote:
>>>>>> +msi_irqdomain = pci_msi_create_irq_domain(NULL, 
>>>>>> _chained_msi_domain_info,
>>>>>> +  vmd_irqdomain);
>>>
>>> But that parent limitation does not matter simply because your
>>> msi_irqdomain does not follow down the hierarchy in the allocation
>>> path.
>>>
>>> So we can avoid the vmd_irqdomain creation completely. It's just
>>> wasting memory and has no value at all. Creating the msi domain with a
>>> NULL parent is possible.
>>
>> I'm having trouble following the hierarchy and didn't understand the
>> connection between the parent and msi comain. It's still new to me,
>> but I don't think a NULL parent is allowable with msi domains:
>>
>>  pci_msi_setup_msi_irqs()
>>   pci_msi_domain_alloc_irqs()
>>msi_domain_alloc_irqs()
>> __irq_domain_alloc_irqs()
>>  irq_domain_alloc_irqs_recursive()
>>   msi_domain_alloc()
>>irq_domain_alloc_irqs_parent()
>>
>> The last call returns -ENOSYS since there parent is NULL. Was the
>> intension to allow no parent, or do I still need to allocate one to
>> achieve the desired chaining?
> 
> Hmm, seems I missed that part. But that's a fixable problem. Jiang?
Hi Keith,
Could you please try the attached patch?
Thanks!
Gerry

> 
> Thanks,
> 
>   tglx
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
>From 6db1568564f8edee626d1c168b22ed5502abc6eb Mon Sep 17 00:00:00 2001
From: Liu Jiang 
Date: Thu, 5 Nov 2015 11:25:07 +0800
Subject: [PATCH] msi: Relax msi_domain_alloc() to support parentless MSI
 irqdomains

Previously msi_domain_alloc() assumes MSI irqdomains always have parent
irqdomains, but that's not true for the new Intel VMD devices. So relax
msi_domain_alloc() to support parentless MSI irqdomains.

Signed-off-by: Jiang Liu 
Signed-off-by: Liu Jiang 
---
 kernel/irq/msi.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7e6512b9dc1f..e4d3d707efff 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -109,9 +109,11 @@ static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (irq_find_mapping(domain, hwirq) > 0)
 		return -EEXIST;
 
-	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
-	if (ret < 0)
-		return ret;
+	if (domain->parent) {
+		ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+		if (ret < 0)
+			return ret;
+	}
 
 	for (i = 0; i < nr_irqs; i++) {
 		ret = ops->msi_init(domain, info, virq + i, hwirq + i, arg);
-- 
1.7.10.4



Re: [RFC PATCHv3 3/4] x86/pci: Initial commit for new VMD device driver

2015-11-04 Thread Jiang Liu


On 2015/11/4 23:14, Thomas Gleixner wrote:
> On Wed, 4 Nov 2015, Keith Busch wrote:
> 
>> On Tue, Nov 03, 2015 at 12:42:02PM +0100, Thomas Gleixner wrote:
>>> On Tue, 3 Nov 2015, Keith Busch wrote:
>>>>>> +msi_irqdomain = pci_msi_create_irq_domain(NULL, 
>>>>>> _chained_msi_domain_info,
>>>>>> +  vmd_irqdomain);
>>>
>>> But that parent limitation does not matter simply because your
>>> msi_irqdomain does not follow down the hierarchy in the allocation
>>> path.
>>>
>>> So we can avoid the vmd_irqdomain creation completely. It's just
>>> wasting memory and has no value at all. Creating the msi domain with a
>>> NULL parent is possible.
>>
>> I'm having trouble following the hierarchy and didn't understand the
>> connection between the parent and msi comain. It's still new to me,
>> but I don't think a NULL parent is allowable with msi domains:
>>
>>  pci_msi_setup_msi_irqs()
>>   pci_msi_domain_alloc_irqs()
>>msi_domain_alloc_irqs()
>> __irq_domain_alloc_irqs()
>>  irq_domain_alloc_irqs_recursive()
>>   msi_domain_alloc()
>>irq_domain_alloc_irqs_parent()
>>
>> The last call returns -ENOSYS since there parent is NULL. Was the
>> intension to allow no parent, or do I still need to allocate one to
>> achieve the desired chaining?
> 
> Hmm, seems I missed that part. But that's a fixable problem. Jiang?
Hi Keith,
Could you please try the attached patch?
Thanks!
Gerry

> 
> Thanks,
> 
>   tglx
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
>From 6db1568564f8edee626d1c168b22ed5502abc6eb Mon Sep 17 00:00:00 2001
From: Liu Jiang <jiang@linux.intel.com>
Date: Thu, 5 Nov 2015 11:25:07 +0800
Subject: [PATCH] msi: Relax msi_domain_alloc() to support parentless MSI
 irqdomains

Previously msi_domain_alloc() assumes MSI irqdomains always have parent
irqdomains, but that's not true for the new Intel VMD devices. So relax
msi_domain_alloc() to support parentless MSI irqdomains.

Signed-off-by: Jiang Liu <jiang@linux.intel.com>
Signed-off-by: Liu Jiang <jiang@linux.intel.com>
---
 kernel/irq/msi.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7e6512b9dc1f..e4d3d707efff 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -109,9 +109,11 @@ static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (irq_find_mapping(domain, hwirq) > 0)
 		return -EEXIST;
 
-	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
-	if (ret < 0)
-		return ret;
+	if (domain->parent) {
+		ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+		if (ret < 0)
+			return ret;
+	}
 
 	for (i = 0; i < nr_irqs; i++) {
 		ret = ops->msi_init(domain, info, virq + i, hwirq + i, arg);
-- 
1.7.10.4



Re: [PATCH v4 4/7] PCI: Add fwnode_handle to pci_sysdata

2015-10-29 Thread Jiang Liu


On 2015/10/30 7:46, ja...@microsoft.com wrote:
> From: Jake Oshins 
> 
> This patch adds an fwnode_handle to struct pci_sysdata, which is
> used by the next patch in the series when trying to locate an
> IRQ domain associated with a root PCI bus.
> 
> Signed-off-by: Jake Oshins 
> ---
>  arch/x86/include/asm/pci.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 4625943..10213a1 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -20,6 +20,9 @@ struct pci_sysdata {
>  #ifdef CONFIG_X86_64
>   void*iommu; /* IOMMU private data */
>  #endif
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> + void*fwnode;/* IRQ domain for MSI assignment */
> +#endif
>  };
>  
>  extern int pci_routeirq;
> @@ -41,6 +44,14 @@ static inline int pci_proc_domain(struct pci_bus *bus)
>  }
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +static inline void *pci_fwnode(struct pci_bus *bus)
> +{
> + struct pci_sysdata *sd = bus->sysdata;
> + return sd->fwnode;
> +}
> +#endif
Hi Jakeo,
I would be better that if the function name indicates
that we are getting PCI host bridge(root bus) firmware node.
And you also need some magic here to avoid breaking compilation
on other archs:
in arch/x86/include/asm/pci.h
#define pci_fwnode  pci_fwnode

in include/asm-generic/pci.h
#ifndef pci_fwnode
#define pci_fwnode(bus) ((void)(bus),NULL)
#endif

Thanks,
Gerry
> +
>  /* Can be used to override the logic in pci_scan_bus for skipping
> already-configured bus numbers - to be used for buggy BIOSes
> or architectures with incomplete PCI setup by the loader */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/7] PCI: Add fwnode_handle to pci_sysdata

2015-10-29 Thread Jiang Liu


On 2015/10/30 7:46, ja...@microsoft.com wrote:
> From: Jake Oshins 
> 
> This patch adds an fwnode_handle to struct pci_sysdata, which is
> used by the next patch in the series when trying to locate an
> IRQ domain associated with a root PCI bus.
> 
> Signed-off-by: Jake Oshins 
> ---
>  arch/x86/include/asm/pci.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 4625943..10213a1 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -20,6 +20,9 @@ struct pci_sysdata {
>  #ifdef CONFIG_X86_64
>   void*iommu; /* IOMMU private data */
>  #endif
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> + void*fwnode;/* IRQ domain for MSI assignment */
> +#endif
>  };
>  
>  extern int pci_routeirq;
> @@ -41,6 +44,14 @@ static inline int pci_proc_domain(struct pci_bus *bus)
>  }
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
> +static inline void *pci_fwnode(struct pci_bus *bus)
> +{
> + struct pci_sysdata *sd = bus->sysdata;
> + return sd->fwnode;
> +}
> +#endif
Hi Jakeo,
I would be better that if the function name indicates
that we are getting PCI host bridge(root bus) firmware node.
And you also need some magic here to avoid breaking compilation
on other archs:
in arch/x86/include/asm/pci.h
#define pci_fwnode  pci_fwnode

in include/asm-generic/pci.h
#ifndef pci_fwnode
#define pci_fwnode(bus) ((void)(bus),NULL)
#endif

Thanks,
Gerry
> +
>  /* Can be used to override the logic in pci_scan_bus for skipping
> already-configured bus numbers - to be used for buggy BIOSes
> or architectures with incomplete PCI setup by the loader */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >