date:20170613

[PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue

2017-06-13 Thread Lv Zheng

The EC event IRQ (SCI_EVT) can only be handled by submitting QR_EC. As the
EC driver handles SCI_EVT in a workqueue, after SCI_EVT is flagged and
before QR_EC is submitted, there is a period risking IRQ storming. EC IRQ
must be masked for this period but linux EC driver never does so.

No end user notices the IRQ storming and no developer fixes this known
issue because:
1. the EC IRQ is always edge triggered GPE, and
2. the kernel can execute no-op EC IRQ handler very fast.
For edge triggered EC GPE platforms, it is only reported of post-resume EC
event lost issues, there won't be an IRQ storming. For level triggered EC
GPE platforms, fortunately the kernel is always fast enough to execute such
a no-op EC IRQ handler so that the IRQ handler won't be accumulated to
starve the task contexts, causing a real IRQ storming.

But the IRQ storming actually can still happen when:
1. the EC IRQ performs like level triggered GPE, and
2. the kernel EC debugging log is turned on but the console is slow enough.
There are more and more platforms using EC GPE as wake GPE where the EC GPE
is likely designed as level triggered. Then when EC debugging log is
enabled, the EC IRQ handler is no longer a no-op but dumps IRQ status to
the consoles. If the consoles are slow enough, the EC IRQs can arrive much
faster than executing the handler. Finally the accumulated EC event IRQ
handlers starve the task contexts, causing the IRQ storming to occur, and
the kernel hangs can be observed during boot/resume.

See link #1 for reference, however the bug link can only be accessed by
priviledged Intel users.

This patch fixes this issue by masking EC IRQ for this period:
1. begins when there is an SCI_EVT IRQ pending, and
2. ends when there is a QR_EC completed (SCI_EVT acknowledged).

Link: https://jira01.devtools.intel.com/browse/LCK-4004 [#1]
Tested-by: Wang Wendy 
Tested-by: Feng Chenzhou 
Signed-off-by: Lv Zheng 
---
 drivers/acpi/ec.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index c24235d..30d7f82 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -459,8 +459,10 @@ static bool acpi_ec_submit_flushable_request(struct 
acpi_ec *ec)
 
 static void acpi_ec_submit_query(struct acpi_ec *ec)
 {
-   if (acpi_ec_event_enabled(ec) &&
-   !test_and_set_bit(EC_FLAGS_QUERY_PENDING, >flags)) {
+   acpi_ec_set_storm(ec, EC_FLAGS_COMMAND_STORM);
+   if (!acpi_ec_event_enabled(ec))
+   return;
+   if (!test_and_set_bit(EC_FLAGS_QUERY_PENDING, >flags)) {
ec_dbg_evt("Command(%s) submitted/blocked",
   acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY));
ec->nr_pending_queries++;
@@ -470,11 +472,10 @@ static void acpi_ec_submit_query(struct acpi_ec *ec)
 
 static void acpi_ec_complete_query(struct acpi_ec *ec)
 {
-   if (test_bit(EC_FLAGS_QUERY_PENDING, >flags)) {
-   clear_bit(EC_FLAGS_QUERY_PENDING, >flags);
+   if (test_and_clear_bit(EC_FLAGS_QUERY_PENDING, >flags))
ec_dbg_evt("Command(%s) unblocked",
   acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY));
-   }
+   acpi_ec_clear_storm(ec, EC_FLAGS_COMMAND_STORM);
 }
 
 static inline void __acpi_ec_enable_event(struct acpi_ec *ec)
-- 
2.7.4

[PATCH 1/3] ACPI: EC: Fix an EC event IRQ storming issue

2017-06-13 Thread Lv Zheng

The EC event IRQ (SCI_EVT) can only be handled by submitting QR_EC. As the
EC driver handles SCI_EVT in a workqueue, after SCI_EVT is flagged and
before QR_EC is submitted, there is a period risking IRQ storming. EC IRQ
must be masked for this period but linux EC driver never does so.

No end user notices the IRQ storming and no developer fixes this known
issue because:
1. the EC IRQ is always edge triggered GPE, and
2. the kernel can execute no-op EC IRQ handler very fast.
For edge triggered EC GPE platforms, it is only reported of post-resume EC
event lost issues, there won't be an IRQ storming. For level triggered EC
GPE platforms, fortunately the kernel is always fast enough to execute such
a no-op EC IRQ handler so that the IRQ handler won't be accumulated to
starve the task contexts, causing a real IRQ storming.

But the IRQ storming actually can still happen when:
1. the EC IRQ performs like level triggered GPE, and
2. the kernel EC debugging log is turned on but the console is slow enough.
There are more and more platforms using EC GPE as wake GPE where the EC GPE
is likely designed as level triggered. Then when EC debugging log is
enabled, the EC IRQ handler is no longer a no-op but dumps IRQ status to
the consoles. If the consoles are slow enough, the EC IRQs can arrive much
faster than executing the handler. Finally the accumulated EC event IRQ
handlers starve the task contexts, causing the IRQ storming to occur, and
the kernel hangs can be observed during boot/resume.

See link #1 for reference, however the bug link can only be accessed by
priviledged Intel users.

This patch fixes this issue by masking EC IRQ for this period:
1. begins when there is an SCI_EVT IRQ pending, and
2. ends when there is a QR_EC completed (SCI_EVT acknowledged).

Link: https://jira01.devtools.intel.com/browse/LCK-4004 [#1]
Tested-by: Wang Wendy 
Tested-by: Feng Chenzhou 
Signed-off-by: Lv Zheng 
---
 drivers/acpi/ec.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index c24235d..30d7f82 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -459,8 +459,10 @@ static bool acpi_ec_submit_flushable_request(struct 
acpi_ec *ec)
 
 static void acpi_ec_submit_query(struct acpi_ec *ec)
 {
-   if (acpi_ec_event_enabled(ec) &&
-   !test_and_set_bit(EC_FLAGS_QUERY_PENDING, >flags)) {
+   acpi_ec_set_storm(ec, EC_FLAGS_COMMAND_STORM);
+   if (!acpi_ec_event_enabled(ec))
+   return;
+   if (!test_and_set_bit(EC_FLAGS_QUERY_PENDING, >flags)) {
ec_dbg_evt("Command(%s) submitted/blocked",
   acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY));
ec->nr_pending_queries++;
@@ -470,11 +472,10 @@ static void acpi_ec_submit_query(struct acpi_ec *ec)
 
 static void acpi_ec_complete_query(struct acpi_ec *ec)
 {
-   if (test_bit(EC_FLAGS_QUERY_PENDING, >flags)) {
-   clear_bit(EC_FLAGS_QUERY_PENDING, >flags);
+   if (test_and_clear_bit(EC_FLAGS_QUERY_PENDING, >flags))
ec_dbg_evt("Command(%s) unblocked",
   acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY));
-   }
+   acpi_ec_clear_storm(ec, EC_FLAGS_COMMAND_STORM);
 }
 
 static inline void __acpi_ec_enable_event(struct acpi_ec *ec)
-- 
2.7.4

[PATCH 02/18] spi: qup: Setup DMA mode correctly

2017-06-13 Thread Varadarajan Narayanan

To operate in DMA mode, the buffer should be aligned and
the size of the transfer should be a multiple of block size
(for v1). And the no. of words being transferred should
be programmed in the count registers appropriately.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 118 +++---
 1 file changed, 55 insertions(+), 63 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index c0d4def..abe799b 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -149,11 +149,18 @@ struct spi_qup {
int rx_bytes;
int qup_v1;
 
-   int use_dma;
+   int mode;
struct dma_slave_config rx_conf;
struct dma_slave_config tx_conf;
 };
 
+static inline bool spi_qup_is_dma_xfer(int mode)
+{
+   if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM)
+   return true;
+
+   return false;
+}
 
 static inline bool spi_qup_is_valid_state(struct spi_qup *controller)
 {
@@ -424,7 +431,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
error = -EIO;
}
 
-   if (!controller->use_dma) {
+   if (!spi_qup_is_dma_xfer(controller->mode)) {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
spi_qup_fifo_read(controller, xfer);
 
@@ -443,34 +450,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-static u32
-spi_qup_get_mode(struct spi_master *master, struct spi_transfer *xfer)
-{
-   struct spi_qup *qup = spi_master_get_devdata(master);
-   u32 mode;
-
-   qup->w_size = 4;
-
-   if (xfer->bits_per_word <= 8)
-   qup->w_size = 1;
-   else if (xfer->bits_per_word <= 16)
-   qup->w_size = 2;
-
-   qup->n_words = xfer->len / qup->w_size;
-
-   if (qup->n_words <= (qup->in_fifo_sz / sizeof(u32)))
-   mode = QUP_IO_M_MODE_FIFO;
-   else
-   mode = QUP_IO_M_MODE_BLOCK;
-
-   return mode;
-}
-
 /* set clock freq ... bits per word */
 static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
 {
struct spi_qup *controller = spi_master_get_devdata(spi->master);
-   u32 config, iomode, mode, control;
+   u32 config, iomode, control;
int ret, n_words;
 
if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) {
@@ -491,25 +475,30 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
return -EIO;
}
 
-   mode = spi_qup_get_mode(spi->master, xfer);
+   controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8);
+   controller->n_words = xfer->len / controller->w_size;
n_words = controller->n_words;
 
-   if (mode == QUP_IO_M_MODE_FIFO) {
+   if (n_words <= (controller->in_fifo_sz / sizeof(u32))) {
+
+   controller->mode = QUP_IO_M_MODE_FIFO;
+
writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT);
writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT);
/* must be zero for FIFO */
writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
-   } else if (!controller->use_dma) {
+   } else if (spi->master->can_dma &&
+  spi->master->can_dma(spi->master, spi, xfer) &&
+  spi->master->cur_msg_mapped) {
+
+   controller->mode = QUP_IO_M_MODE_BAM;
+
writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
/* must be zero for BLOCK and BAM */
writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
-   } else {
-   mode = QUP_IO_M_MODE_BAM;
-   writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
-   writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
 
if (!controller->qup_v1) {
void __iomem *input_cnt;
@@ -528,19 +517,28 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
 
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
}
+   } else {
+
+   controller->mode = QUP_IO_M_MODE_BLOCK;
+
+   writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
+   writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
+   /* must be zero for BLOCK and BAM */
+   writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
+   writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
}

[PATCH 03/18] spi: qup: Add completion timeout for dma mode

2017-06-13 Thread Varadarajan Narayanan

Use different 'completion' structures to track the completion
of DMA Tx/Rx and PIO.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index abe799b..272e48e 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -142,6 +142,8 @@ struct spi_qup {
 
struct spi_transfer *xfer;
struct completion   done;
+   struct completion   txc;
+   struct completion   rxc;
int error;
int w_size; /* bytes per SPI word */
int n_words;
@@ -283,16 +285,13 @@ static void spi_qup_fifo_write(struct spi_qup *controller,
 
 static void spi_qup_dma_done(void *data)
 {
-   struct spi_qup *qup = data;
-
-   complete(>done);
+   complete(data);
 }
 
 static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer 
*xfer,
   enum dma_transfer_direction dir,
-  dma_async_tx_callback callback)
+  dma_async_tx_callback callback, void *data)
 {
-   struct spi_qup *qup = spi_master_get_devdata(master);
unsigned long flags = DMA_PREP_INTERRUPT | DMA_PREP_FENCE;
struct dma_async_tx_descriptor *desc;
struct scatterlist *sgl;
@@ -315,7 +314,7 @@ static int spi_qup_prep_sg(struct spi_master *master, 
struct spi_transfer *xfer,
return -EINVAL;
 
desc->callback = callback;
-   desc->callback_param = qup;
+   desc->callback_param = data;
 
cookie = dmaengine_submit(desc);
 
@@ -333,16 +332,12 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
 
 static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer)
 {
-   dma_async_tx_callback rx_done = NULL, tx_done = NULL;
+   struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
-   if (xfer->rx_buf)
-   rx_done = spi_qup_dma_done;
-   else if (xfer->tx_buf)
-   tx_done = spi_qup_dma_done;
-
if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM, rx_done);
+   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
+   spi_qup_dma_done, >rxc);
if (ret)
return ret;
 
@@ -350,13 +345,20 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer)
}
 
if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV, tx_done);
+   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV,
+   spi_qup_dma_done, >txc);
if (ret)
return ret;
 
dma_async_issue_pending(master->dma_tx);
}
 
+   if (xfer->rx_buf && !wait_for_completion_timeout(>rxc, timeout))
+   return -ETIMEDOUT;
+
+   if (xfer->tx_buf && !wait_for_completion_timeout(>txc, timeout))
+   return -ETIMEDOUT;
+
return 0;
 }
 
@@ -622,7 +624,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
 
-   reinit_completion(>done);
 
spin_lock_irqsave(>lock, flags);
controller->xfer = xfer;
@@ -631,10 +632,14 @@ static int spi_qup_transfer_one(struct spi_master *master,
controller->tx_bytes = 0;
spin_unlock_irqrestore(>lock, flags);
 
-   if (spi_qup_is_dma_xfer(controller->mode))
+   if (spi_qup_is_dma_xfer(controller->mode)) {
+   reinit_completion(>rxc);
+   reinit_completion(>txc);
ret = spi_qup_do_dma(master, xfer);
-   else
+   } else {
+   reinit_completion(>done);
ret = spi_qup_do_pio(master, xfer);
+   }
 
if (ret)
goto exit;
@@ -860,6 +865,8 @@ static int spi_qup_probe(struct platform_device *pdev)
master->set_cs = spi_qup_set_cs;
 
spin_lock_init(>lock);
+   init_completion(>rxc);
+   init_completion(>txc);
init_completion(>done);
 
iomode = readl_relaxed(base + QUP_IO_M_MODES);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 00/18] spi: qup: Fixes and add support for >64k transfers

2017-06-13 Thread Varadarajan Narayanan

v1:
This series fixes some existing issues in the code for both
interrupt and dma mode. Patches 1 - 11 are the fixes.
Random failures/timeout are observed without these fixes.
Also, the current driver does not support block transfers > 64K
and the driver quietly fails. Patches 12 - 18 add support for this
in both interrupt and dma mode.

The entire series has been tested on ipq4019 with
SPI-NOR flash for block sizes > 64k.

Varadarajan Narayanan (18):
  spi: qup: Enable chip select support
  spi: qup: Setup DMA mode correctly
  spi: qup: Add completion timeout for dma mode
  spi: qup: Add completion timeout for fifo/block mode
  spi: qup: Place the QUP in run mode before DMA transactions
  spi: qup: Fix error handling in spi_qup_prep_sg
  spi: qup: Fix transaction done signaling
  spi: qup: Handle v1 dma completion differently
  spi: qup: Do block sized read/write in block mode
  spi: qup: Fix DMA mode interrupt handling
  spi: qup: properly detect extra interrupts
  spi: qup: refactor spi_qup_io_config into two functions
  spi: qup: call io_config in mode specific function
  spi: qup: allow block mode to generate multiple transactions
  spi: qup: refactor spi_qup_prep_sg
  spi: qup: allow multiple DMA transactions per spi xfer
  spi: qup: Ensure done detection
  spi: qup: support for qup v1 dma

 .../devicetree/bindings/spi/qcom,spi-qup.txt   |   6 +
 drivers/spi/spi-qup.c  | 639 +++--
 2 files changed, 462 insertions(+), 183 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 02/18] spi: qup: Setup DMA mode correctly

2017-06-13 Thread Varadarajan Narayanan

To operate in DMA mode, the buffer should be aligned and
the size of the transfer should be a multiple of block size
(for v1). And the no. of words being transferred should
be programmed in the count registers appropriately.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 118 +++---
 1 file changed, 55 insertions(+), 63 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index c0d4def..abe799b 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -149,11 +149,18 @@ struct spi_qup {
int rx_bytes;
int qup_v1;
 
-   int use_dma;
+   int mode;
struct dma_slave_config rx_conf;
struct dma_slave_config tx_conf;
 };
 
+static inline bool spi_qup_is_dma_xfer(int mode)
+{
+   if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM)
+   return true;
+
+   return false;
+}
 
 static inline bool spi_qup_is_valid_state(struct spi_qup *controller)
 {
@@ -424,7 +431,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
error = -EIO;
}
 
-   if (!controller->use_dma) {
+   if (!spi_qup_is_dma_xfer(controller->mode)) {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
spi_qup_fifo_read(controller, xfer);
 
@@ -443,34 +450,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-static u32
-spi_qup_get_mode(struct spi_master *master, struct spi_transfer *xfer)
-{
-   struct spi_qup *qup = spi_master_get_devdata(master);
-   u32 mode;
-
-   qup->w_size = 4;
-
-   if (xfer->bits_per_word <= 8)
-   qup->w_size = 1;
-   else if (xfer->bits_per_word <= 16)
-   qup->w_size = 2;
-
-   qup->n_words = xfer->len / qup->w_size;
-
-   if (qup->n_words <= (qup->in_fifo_sz / sizeof(u32)))
-   mode = QUP_IO_M_MODE_FIFO;
-   else
-   mode = QUP_IO_M_MODE_BLOCK;
-
-   return mode;
-}
-
 /* set clock freq ... bits per word */
 static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
 {
struct spi_qup *controller = spi_master_get_devdata(spi->master);
-   u32 config, iomode, mode, control;
+   u32 config, iomode, control;
int ret, n_words;
 
if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) {
@@ -491,25 +475,30 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
return -EIO;
}
 
-   mode = spi_qup_get_mode(spi->master, xfer);
+   controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8);
+   controller->n_words = xfer->len / controller->w_size;
n_words = controller->n_words;
 
-   if (mode == QUP_IO_M_MODE_FIFO) {
+   if (n_words <= (controller->in_fifo_sz / sizeof(u32))) {
+
+   controller->mode = QUP_IO_M_MODE_FIFO;
+
writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT);
writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT);
/* must be zero for FIFO */
writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
-   } else if (!controller->use_dma) {
+   } else if (spi->master->can_dma &&
+  spi->master->can_dma(spi->master, spi, xfer) &&
+  spi->master->cur_msg_mapped) {
+
+   controller->mode = QUP_IO_M_MODE_BAM;
+
writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
/* must be zero for BLOCK and BAM */
writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
-   } else {
-   mode = QUP_IO_M_MODE_BAM;
-   writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
-   writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
 
if (!controller->qup_v1) {
void __iomem *input_cnt;
@@ -528,19 +517,28 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
 
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
}
+   } else {
+
+   controller->mode = QUP_IO_M_MODE_BLOCK;
+
+   writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
+   writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
+   /* must be zero for BLOCK and BAM */
+   writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
+   writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
}
 
iomode =

[PATCH 03/18] spi: qup: Add completion timeout for dma mode

2017-06-13 Thread Varadarajan Narayanan

Use different 'completion' structures to track the completion
of DMA Tx/Rx and PIO.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index abe799b..272e48e 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -142,6 +142,8 @@ struct spi_qup {
 
struct spi_transfer *xfer;
struct completion   done;
+   struct completion   txc;
+   struct completion   rxc;
int error;
int w_size; /* bytes per SPI word */
int n_words;
@@ -283,16 +285,13 @@ static void spi_qup_fifo_write(struct spi_qup *controller,
 
 static void spi_qup_dma_done(void *data)
 {
-   struct spi_qup *qup = data;
-
-   complete(>done);
+   complete(data);
 }
 
 static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer 
*xfer,
   enum dma_transfer_direction dir,
-  dma_async_tx_callback callback)
+  dma_async_tx_callback callback, void *data)
 {
-   struct spi_qup *qup = spi_master_get_devdata(master);
unsigned long flags = DMA_PREP_INTERRUPT | DMA_PREP_FENCE;
struct dma_async_tx_descriptor *desc;
struct scatterlist *sgl;
@@ -315,7 +314,7 @@ static int spi_qup_prep_sg(struct spi_master *master, 
struct spi_transfer *xfer,
return -EINVAL;
 
desc->callback = callback;
-   desc->callback_param = qup;
+   desc->callback_param = data;
 
cookie = dmaengine_submit(desc);
 
@@ -333,16 +332,12 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
 
 static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer)
 {
-   dma_async_tx_callback rx_done = NULL, tx_done = NULL;
+   struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
-   if (xfer->rx_buf)
-   rx_done = spi_qup_dma_done;
-   else if (xfer->tx_buf)
-   tx_done = spi_qup_dma_done;
-
if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM, rx_done);
+   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
+   spi_qup_dma_done, >rxc);
if (ret)
return ret;
 
@@ -350,13 +345,20 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer)
}
 
if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV, tx_done);
+   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV,
+   spi_qup_dma_done, >txc);
if (ret)
return ret;
 
dma_async_issue_pending(master->dma_tx);
}
 
+   if (xfer->rx_buf && !wait_for_completion_timeout(>rxc, timeout))
+   return -ETIMEDOUT;
+
+   if (xfer->tx_buf && !wait_for_completion_timeout(>txc, timeout))
+   return -ETIMEDOUT;
+
return 0;
 }
 
@@ -622,7 +624,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
 
-   reinit_completion(>done);
 
spin_lock_irqsave(>lock, flags);
controller->xfer = xfer;
@@ -631,10 +632,14 @@ static int spi_qup_transfer_one(struct spi_master *master,
controller->tx_bytes = 0;
spin_unlock_irqrestore(>lock, flags);
 
-   if (spi_qup_is_dma_xfer(controller->mode))
+   if (spi_qup_is_dma_xfer(controller->mode)) {
+   reinit_completion(>rxc);
+   reinit_completion(>txc);
ret = spi_qup_do_dma(master, xfer);
-   else
+   } else {
+   reinit_completion(>done);
ret = spi_qup_do_pio(master, xfer);
+   }
 
if (ret)
goto exit;
@@ -860,6 +865,8 @@ static int spi_qup_probe(struct platform_device *pdev)
master->set_cs = spi_qup_set_cs;
 
spin_lock_init(>lock);
+   init_completion(>rxc);
+   init_completion(>txc);
init_completion(>done);
 
iomode = readl_relaxed(base + QUP_IO_M_MODES);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 00/18] spi: qup: Fixes and add support for >64k transfers

2017-06-13 Thread Varadarajan Narayanan

v1:
This series fixes some existing issues in the code for both
interrupt and dma mode. Patches 1 - 11 are the fixes.
Random failures/timeout are observed without these fixes.
Also, the current driver does not support block transfers > 64K
and the driver quietly fails. Patches 12 - 18 add support for this
in both interrupt and dma mode.

The entire series has been tested on ipq4019 with
SPI-NOR flash for block sizes > 64k.

Varadarajan Narayanan (18):
  spi: qup: Enable chip select support
  spi: qup: Setup DMA mode correctly
  spi: qup: Add completion timeout for dma mode
  spi: qup: Add completion timeout for fifo/block mode
  spi: qup: Place the QUP in run mode before DMA transactions
  spi: qup: Fix error handling in spi_qup_prep_sg
  spi: qup: Fix transaction done signaling
  spi: qup: Handle v1 dma completion differently
  spi: qup: Do block sized read/write in block mode
  spi: qup: Fix DMA mode interrupt handling
  spi: qup: properly detect extra interrupts
  spi: qup: refactor spi_qup_io_config into two functions
  spi: qup: call io_config in mode specific function
  spi: qup: allow block mode to generate multiple transactions
  spi: qup: refactor spi_qup_prep_sg
  spi: qup: allow multiple DMA transactions per spi xfer
  spi: qup: Ensure done detection
  spi: qup: support for qup v1 dma

 .../devicetree/bindings/spi/qcom,spi-qup.txt   |   6 +
 drivers/spi/spi-qup.c  | 639 +++--
 2 files changed, 462 insertions(+), 183 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 07/18] spi: qup: Fix transaction done signaling

2017-06-13 Thread Varadarajan Narayanan

Wait to signal done until we get all of the interrupts we are expecting
to get for a transaction.  If we don't wait for the input done flag, we
can be inbetween transactions when the done flag comes in and this can
mess up the next transaction.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 2124815..7c22ee4 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -465,7 +465,8 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
controller->xfer = xfer;
spin_unlock_irqrestore(>lock, flags);
 
-   if (controller->rx_bytes == xfer->len || error)
+   if ((controller->rx_bytes == xfer->len &&
+   (opflags & QUP_OP_MAX_INPUT_DONE_FLAG)) ||  error)
complete(>done);
 
return IRQ_HANDLED;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 07/18] spi: qup: Fix transaction done signaling

2017-06-13 Thread Varadarajan Narayanan

Wait to signal done until we get all of the interrupts we are expecting
to get for a transaction.  If we don't wait for the input done flag, we
can be inbetween transactions when the done flag comes in and this can
mess up the next transaction.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 2124815..7c22ee4 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -465,7 +465,8 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
controller->xfer = xfer;
spin_unlock_irqrestore(>lock, flags);
 
-   if (controller->rx_bytes == xfer->len || error)
+   if ((controller->rx_bytes == xfer->len &&
+   (opflags & QUP_OP_MAX_INPUT_DONE_FLAG)) ||  error)
complete(>done);
 
return IRQ_HANDLED;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 11/18] spi: qup: properly detect extra interrupts

2017-06-13 Thread Varadarajan Narayanan

It's possible for a SPI transaction to complete and get another
interrupt and have it processed on the same spi_transfer before the
transfer_one can set it to NULL.

This masks unexpected interrupts, so let's set the spi_transfer to
NULL in the interrupt once the transaction is done. So we can
properly detect these bad interrupts and print warning messages.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index bd53e82..1a2a9d9 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -496,13 +496,13 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
struct spi_qup *controller = dev_id;
struct spi_transfer *xfer;
u32 opflags, qup_err, spi_err;
-   unsigned long flags;
int error = 0;
+   bool done = 0;
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock(>lock);
xfer = controller->xfer;
controller->xfer = NULL;
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock(>lock);
 
qup_err = readl_relaxed(controller->base + QUP_ERROR_FLAGS);
spi_err = readl_relaxed(controller->base + SPI_ERROR_FLAGS);
@@ -556,16 +556,19 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
spi_qup_write(controller, xfer);
}
 
-   spin_lock_irqsave(>lock, flags);
-   controller->error = error;
-   controller->xfer = xfer;
-   spin_unlock_irqrestore(>lock, flags);
-
/* re-read opflags as flags may have changed due to actions above */
opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
 
if ((controller->rx_bytes == xfer->len &&
(opflags & QUP_OP_MAX_INPUT_DONE_FLAG)) ||  error)
+   done = true;
+
+   spin_lock(>lock);
+   controller->error = error;
+   controller->xfer = done ? NULL : xfer;
+   spin_unlock(>lock);
+
+   if (done)
complete(>done);
 
return IRQ_HANDLED;
@@ -765,7 +768,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
 exit:
spi_qup_set_state(controller, QUP_STATE_RESET);
spin_lock_irqsave(>lock, flags);
-   controller->xfer = NULL;
if (!ret)
ret = controller->error;
spin_unlock_irqrestore(>lock, flags);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 11/18] spi: qup: properly detect extra interrupts

2017-06-13 Thread Varadarajan Narayanan

It's possible for a SPI transaction to complete and get another
interrupt and have it processed on the same spi_transfer before the
transfer_one can set it to NULL.

This masks unexpected interrupts, so let's set the spi_transfer to
NULL in the interrupt once the transaction is done. So we can
properly detect these bad interrupts and print warning messages.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index bd53e82..1a2a9d9 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -496,13 +496,13 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
struct spi_qup *controller = dev_id;
struct spi_transfer *xfer;
u32 opflags, qup_err, spi_err;
-   unsigned long flags;
int error = 0;
+   bool done = 0;
 
-   spin_lock_irqsave(>lock, flags);
+   spin_lock(>lock);
xfer = controller->xfer;
controller->xfer = NULL;
-   spin_unlock_irqrestore(>lock, flags);
+   spin_unlock(>lock);
 
qup_err = readl_relaxed(controller->base + QUP_ERROR_FLAGS);
spi_err = readl_relaxed(controller->base + SPI_ERROR_FLAGS);
@@ -556,16 +556,19 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
spi_qup_write(controller, xfer);
}
 
-   spin_lock_irqsave(>lock, flags);
-   controller->error = error;
-   controller->xfer = xfer;
-   spin_unlock_irqrestore(>lock, flags);
-
/* re-read opflags as flags may have changed due to actions above */
opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
 
if ((controller->rx_bytes == xfer->len &&
(opflags & QUP_OP_MAX_INPUT_DONE_FLAG)) ||  error)
+   done = true;
+
+   spin_lock(>lock);
+   controller->error = error;
+   controller->xfer = done ? NULL : xfer;
+   spin_unlock(>lock);
+
+   if (done)
complete(>done);
 
return IRQ_HANDLED;
@@ -765,7 +768,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
 exit:
spi_qup_set_state(controller, QUP_STATE_RESET);
spin_lock_irqsave(>lock, flags);
-   controller->xfer = NULL;
if (!ret)
ret = controller->error;
spin_unlock_irqrestore(>lock, flags);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

Re: [Merge tag 'pci-v4.12-changes' of git] 857f864014: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

2017-06-13 Thread Logan Gunthorpe

On 13/06/17 11:00 PM, Greg Kroah-Hartman wrote:
> No, don't modify any drivers, do this in the core chardev code.

Ok, well then maybe I misunderstood what you originally asked for
because I don't see how you can change a fixed allocation to a dynamic
one without touching the driver code which needs to know the major that
was assigned in the end.

> At quick glance, looks good to me, care to clean it up and make it behind
> a config option?

Sure. However, is a config option really necessary here? As is, the
extended dynamic region will only be used if there are too many chardev
majors and it shouldn't have _any_ effect on users that have a small
number. So it seems like not selecting that option is just telling the
kernel to be broken for no obvious trade-off.

Logan

[PATCH 12/18] spi: qup: refactor spi_qup_io_config into two functions

2017-06-13 Thread Varadarajan Narayanan

This is in preparation for handling transactions larger than
64K-1 bytes in block mode, which is currently unsupported and
quietly fails.

We need to break these into two functions 1) prep is
called once per spi_message and 2) io_config is called
once per spi-qup bus transaction

This is just refactoring, there should be no functional
change

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 109 +++---
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1a2a9d9..c023dc1 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -574,12 +574,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-/* set clock freq ... bits per word */
-static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
+/* set clock freq ... bits per word, determine mode */
+static int spi_qup_io_prep(struct spi_device *spi, struct spi_transfer *xfer)
 {
struct spi_qup *controller = spi_master_get_devdata(spi->master);
-   u32 config, iomode, control;
-   int ret, n_words;
+   int ret;
 
if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) {
dev_err(controller->dev, "too big size for loopback %d > %d\n",
@@ -594,32 +593,59 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
return -EIO;
}
 
-   if (spi_qup_set_state(controller, QUP_STATE_RESET)) {
-   dev_err(controller->dev, "cannot set RESET state\n");
-   return -EIO;
-   }
-
controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8);
controller->n_words = xfer->len / controller->w_size;
-   n_words = controller->n_words;
-
-   if (n_words <= (controller->in_fifo_sz / sizeof(u32))) {
 
+   if (controller->n_words <= (controller->in_fifo_sz / sizeof(u32)))
controller->mode = QUP_IO_M_MODE_FIFO;
+   else if (spi->master->can_dma &&
+spi->master->can_dma(spi->master, spi, xfer) &&
+spi->master->cur_msg_mapped)
+   controller->mode = QUP_IO_M_MODE_BAM;
+   else
+   controller->mode = QUP_IO_M_MODE_BLOCK;
+
+   return 0;
+}
+
+/* prep qup for another spi transaction of specific type */
+static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
+{
+   struct spi_qup *controller = spi_master_get_devdata(spi->master);
+   u32 config, iomode, control;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   controller->xfer = xfer;
+   controller->error= 0;
+   controller->rx_bytes = 0;
+   controller->tx_bytes = 0;
+   spin_unlock_irqrestore(>lock, flags);
+
+
+   if (spi_qup_set_state(controller, QUP_STATE_RESET)) {
+   dev_err(controller->dev, "cannot set RESET state\n");
+   return -EIO;
+   }
 
-   writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT);
-   writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT);
+   switch (controller->mode) {
+   case QUP_IO_M_MODE_FIFO:
+   reinit_completion(>done);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_READ_CNT);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_WRITE_CNT);
/* must be zero for FIFO */
writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
-   } else if (spi->master->can_dma &&
-  spi->master->can_dma(spi->master, spi, xfer) &&
-  spi->master->cur_msg_mapped) {
-
-   controller->mode = QUP_IO_M_MODE_BAM;
-
-   writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
-   writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
+   break;
+   case QUP_IO_M_MODE_BAM:
+   reinit_completion(>txc);
+   reinit_completion(>rxc);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_INPUT_CNT);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_OUTPUT_CNT);
/* must be zero for BLOCK and BAM */
writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
@@ -637,19 +663,25 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
if (xfer->tx_buf)
writel_relaxed(0, input_cnt);
else
-

Re: [PATCH] powerpc: dts: use #include "..." to include local DT

2017-06-13 Thread Masahiro Yamada

Hi.

2017-06-13 19:21 GMT+09:00 Michael Ellerman :
> Masahiro Yamada  writes:
>
>> Hi
>>
>> (+Anatolij Gustschin )
>>
>>
>> Ping.
>> I am not 100% sure who is responsible for this,
>> but somebody, could take a look at this patch, please?
>
> Have you tested it actually works?
>
> It sounds reasonable, and if it behaves as you describe there is no
> change in behaviour, right?

I do not have access to hardware,
but it is pretty easy to test this patch.

$ make O=foo ARCH=powerpc CROSS_COMPILE=powerpc-linux-  dts/ac14xx.dtb

gave me the DTB output.

The binary comparison matched with/without this patch,
so I am sure there is no change in behavior.

Likewise for mpc5121ads and pdm360ng.

Double-check by Anatolij was very appreciated.

-- 
Best Regards
Masahiro Yamada

Re: [Merge tag 'pci-v4.12-changes' of git] 857f864014: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

2017-06-13 Thread Logan Gunthorpe

On 13/06/17 11:00 PM, Greg Kroah-Hartman wrote:
> No, don't modify any drivers, do this in the core chardev code.

Ok, well then maybe I misunderstood what you originally asked for
because I don't see how you can change a fixed allocation to a dynamic
one without touching the driver code which needs to know the major that
was assigned in the end.

> At quick glance, looks good to me, care to clean it up and make it behind
> a config option?

Sure. However, is a config option really necessary here? As is, the
extended dynamic region will only be used if there are too many chardev
majors and it shouldn't have _any_ effect on users that have a small
number. So it seems like not selecting that option is just telling the
kernel to be broken for no obvious trade-off.

Logan

[PATCH 12/18] spi: qup: refactor spi_qup_io_config into two functions

2017-06-13 Thread Varadarajan Narayanan

This is in preparation for handling transactions larger than
64K-1 bytes in block mode, which is currently unsupported and
quietly fails.

We need to break these into two functions 1) prep is
called once per spi_message and 2) io_config is called
once per spi-qup bus transaction

This is just refactoring, there should be no functional
change

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 109 +++---
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1a2a9d9..c023dc1 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -574,12 +574,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-/* set clock freq ... bits per word */
-static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
+/* set clock freq ... bits per word, determine mode */
+static int spi_qup_io_prep(struct spi_device *spi, struct spi_transfer *xfer)
 {
struct spi_qup *controller = spi_master_get_devdata(spi->master);
-   u32 config, iomode, control;
-   int ret, n_words;
+   int ret;
 
if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) {
dev_err(controller->dev, "too big size for loopback %d > %d\n",
@@ -594,32 +593,59 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
return -EIO;
}
 
-   if (spi_qup_set_state(controller, QUP_STATE_RESET)) {
-   dev_err(controller->dev, "cannot set RESET state\n");
-   return -EIO;
-   }
-
controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8);
controller->n_words = xfer->len / controller->w_size;
-   n_words = controller->n_words;
-
-   if (n_words <= (controller->in_fifo_sz / sizeof(u32))) {
 
+   if (controller->n_words <= (controller->in_fifo_sz / sizeof(u32)))
controller->mode = QUP_IO_M_MODE_FIFO;
+   else if (spi->master->can_dma &&
+spi->master->can_dma(spi->master, spi, xfer) &&
+spi->master->cur_msg_mapped)
+   controller->mode = QUP_IO_M_MODE_BAM;
+   else
+   controller->mode = QUP_IO_M_MODE_BLOCK;
+
+   return 0;
+}
+
+/* prep qup for another spi transaction of specific type */
+static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
+{
+   struct spi_qup *controller = spi_master_get_devdata(spi->master);
+   u32 config, iomode, control;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   controller->xfer = xfer;
+   controller->error= 0;
+   controller->rx_bytes = 0;
+   controller->tx_bytes = 0;
+   spin_unlock_irqrestore(>lock, flags);
+
+
+   if (spi_qup_set_state(controller, QUP_STATE_RESET)) {
+   dev_err(controller->dev, "cannot set RESET state\n");
+   return -EIO;
+   }
 
-   writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT);
-   writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT);
+   switch (controller->mode) {
+   case QUP_IO_M_MODE_FIFO:
+   reinit_completion(>done);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_READ_CNT);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_WRITE_CNT);
/* must be zero for FIFO */
writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
-   } else if (spi->master->can_dma &&
-  spi->master->can_dma(spi->master, spi, xfer) &&
-  spi->master->cur_msg_mapped) {
-
-   controller->mode = QUP_IO_M_MODE_BAM;
-
-   writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
-   writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
+   break;
+   case QUP_IO_M_MODE_BAM:
+   reinit_completion(>txc);
+   reinit_completion(>rxc);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_INPUT_CNT);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_OUTPUT_CNT);
/* must be zero for BLOCK and BAM */
writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
@@ -637,19 +663,25 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
if (xfer->tx_buf)
writel_relaxed(0, input_cnt);
else
-   writel_relaxed(n_words,

Re: [PATCH] powerpc: dts: use #include "..." to include local DT

2017-06-13 Thread Masahiro Yamada

Hi.

2017-06-13 19:21 GMT+09:00 Michael Ellerman :
> Masahiro Yamada  writes:
>
>> Hi
>>
>> (+Anatolij Gustschin )
>>
>>
>> Ping.
>> I am not 100% sure who is responsible for this,
>> but somebody, could take a look at this patch, please?
>
> Have you tested it actually works?
>
> It sounds reasonable, and if it behaves as you describe there is no
> change in behaviour, right?

I do not have access to hardware,
but it is pretty easy to test this patch.

$ make O=foo ARCH=powerpc CROSS_COMPILE=powerpc-linux-  dts/ac14xx.dtb

gave me the DTB output.

The binary comparison matched with/without this patch,
so I am sure there is no change in behavior.

Likewise for mpc5121ads and pdm360ng.

Double-check by Anatolij was very appreciated.

-- 
Best Regards
Masahiro Yamada

[PATCH 17/18] spi: qup: Ensure done detection

2017-06-13 Thread Varadarajan Narayanan

This patch fixes an issue where a SPI transaction has completed, but the
done condition is missed.  This occurs because at the time of interrupt the
MAX_INPUT_DONE_FLAG is not asserted.  However, in the process of reading
blocks of data from the FIFO, the last portion of data comes in.

The opflags read at the beginning of the irq handler no longer matches the
current opflag state.  To get around this condition, the block read
function should update the opflags so that done detection is correct after
the return.

Signed-off-by: Andy Gross 
Signed-off-by: Abhishek Sahu 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 35e12ba..4ef9301 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -268,7 +268,7 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller, u32 num_words)
}
 }
 
-static void spi_qup_read(struct spi_qup *controller)
+static void spi_qup_read(struct spi_qup *controller, u32 *opflags)
 {
u32 remainder, words_per_block, num_words;
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
@@ -307,10 +307,12 @@ static void spi_qup_read(struct spi_qup *controller)
 
/*
 * Due to extra stickiness of the QUP_OP_IN_SERVICE_FLAG during block
-* mode reads, it has to be cleared again at the very end
+* reads, it has to be cleared again at the very end.  However, be sure
+* to refresh opflags value because MAX_INPUT_DONE_FLAG may now be
+* present and this is used to determine if transaction is complete
 */
-   if (is_block_mode && spi_qup_is_flag_set(controller,
-   QUP_OP_MAX_INPUT_DONE_FLAG))
+   *opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
+   if (is_block_mode && *opflags & QUP_OP_MAX_INPUT_DONE_FLAG)
writel_relaxed(QUP_OP_IN_SERVICE_FLAG,
   controller->base + QUP_OPERATIONAL);
 
@@ -638,7 +640,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
complete(>txc);
} else {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
-   spi_qup_read(controller);
+   spi_qup_read(controller, );
 
if (opflags & QUP_OP_OUT_SERVICE_FLAG)
spi_qup_write(controller);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 17/18] spi: qup: Ensure done detection

2017-06-13 Thread Varadarajan Narayanan

This patch fixes an issue where a SPI transaction has completed, but the
done condition is missed.  This occurs because at the time of interrupt the
MAX_INPUT_DONE_FLAG is not asserted.  However, in the process of reading
blocks of data from the FIFO, the last portion of data comes in.

The opflags read at the beginning of the irq handler no longer matches the
current opflag state.  To get around this condition, the block read
function should update the opflags so that done detection is correct after
the return.

Signed-off-by: Andy Gross 
Signed-off-by: Abhishek Sahu 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 35e12ba..4ef9301 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -268,7 +268,7 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller, u32 num_words)
}
 }
 
-static void spi_qup_read(struct spi_qup *controller)
+static void spi_qup_read(struct spi_qup *controller, u32 *opflags)
 {
u32 remainder, words_per_block, num_words;
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
@@ -307,10 +307,12 @@ static void spi_qup_read(struct spi_qup *controller)
 
/*
 * Due to extra stickiness of the QUP_OP_IN_SERVICE_FLAG during block
-* mode reads, it has to be cleared again at the very end
+* reads, it has to be cleared again at the very end.  However, be sure
+* to refresh opflags value because MAX_INPUT_DONE_FLAG may now be
+* present and this is used to determine if transaction is complete
 */
-   if (is_block_mode && spi_qup_is_flag_set(controller,
-   QUP_OP_MAX_INPUT_DONE_FLAG))
+   *opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
+   if (is_block_mode && *opflags & QUP_OP_MAX_INPUT_DONE_FLAG)
writel_relaxed(QUP_OP_IN_SERVICE_FLAG,
   controller->base + QUP_OPERATIONAL);
 
@@ -638,7 +640,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
complete(>txc);
} else {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
-   spi_qup_read(controller);
+   spi_qup_read(controller, );
 
if (opflags & QUP_OP_OUT_SERVICE_FLAG)
spi_qup_write(controller);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 13/18] spi: qup: call io_config in mode specific function

2017-06-13 Thread Varadarajan Narayanan

DMA transactions should only only need to call io_config only once, but
block mode might call it several times to setup several transactions so
it can handle reads/writes larger than the max size per transaction, so
we move the call to the do_ functions.

This is just refactoring, there should be no functional change

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index c023dc1..f085e36 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -158,6 +158,8 @@ struct spi_qup {
struct dma_slave_config tx_conf;
 };
 
+static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer 
*xfer);
+
 static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag)
 {
u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL);
@@ -416,13 +418,18 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
-static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer,
+static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
+   struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
dma_async_tx_callback done = qup->qup_v1 ? NULL : spi_qup_dma_done;
int ret;
 
+   ret = spi_qup_io_config(spi, xfer);
+   if (ret)
+   return ret;
+
/* before issuing the descriptors, set the QUP to run */
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
@@ -458,12 +465,17 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
return 0;
 }
 
-static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer,
+static int spi_qup_do_pio(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
+   struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
+   ret = spi_qup_io_config(spi, xfer);
+   if (ret)
+   return ret;
+
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
dev_warn(qup->dev, "%s(%d): cannot set RUN state\n");
@@ -774,18 +786,14 @@ static int spi_qup_transfer_one(struct spi_master *master,
if (ret)
return ret;
 
-   ret = spi_qup_io_config(spi, xfer);
-   if (ret)
-   return ret;
-
timeout = DIV_ROUND_UP(xfer->speed_hz, MSEC_PER_SEC);
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
 
if (spi_qup_is_dma_xfer(controller->mode))
-   ret = spi_qup_do_dma(master, xfer, timeout);
+   ret = spi_qup_do_dma(spi, xfer, timeout);
else
-   ret = spi_qup_do_pio(master, xfer, timeout);
+   ret = spi_qup_do_pio(spi, xfer, timeout);
 
if (ret)
goto exit;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 13/18] spi: qup: call io_config in mode specific function

2017-06-13 Thread Varadarajan Narayanan

DMA transactions should only only need to call io_config only once, but
block mode might call it several times to setup several transactions so
it can handle reads/writes larger than the max size per transaction, so
we move the call to the do_ functions.

This is just refactoring, there should be no functional change

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index c023dc1..f085e36 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -158,6 +158,8 @@ struct spi_qup {
struct dma_slave_config tx_conf;
 };
 
+static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer 
*xfer);
+
 static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag)
 {
u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL);
@@ -416,13 +418,18 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
-static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer,
+static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
+   struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
dma_async_tx_callback done = qup->qup_v1 ? NULL : spi_qup_dma_done;
int ret;
 
+   ret = spi_qup_io_config(spi, xfer);
+   if (ret)
+   return ret;
+
/* before issuing the descriptors, set the QUP to run */
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
@@ -458,12 +465,17 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
return 0;
 }
 
-static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer,
+static int spi_qup_do_pio(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
+   struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
+   ret = spi_qup_io_config(spi, xfer);
+   if (ret)
+   return ret;
+
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
dev_warn(qup->dev, "%s(%d): cannot set RUN state\n");
@@ -774,18 +786,14 @@ static int spi_qup_transfer_one(struct spi_master *master,
if (ret)
return ret;
 
-   ret = spi_qup_io_config(spi, xfer);
-   if (ret)
-   return ret;
-
timeout = DIV_ROUND_UP(xfer->speed_hz, MSEC_PER_SEC);
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
 
if (spi_qup_is_dma_xfer(controller->mode))
-   ret = spi_qup_do_dma(master, xfer, timeout);
+   ret = spi_qup_do_dma(spi, xfer, timeout);
else
-   ret = spi_qup_do_pio(master, xfer, timeout);
+   ret = spi_qup_do_pio(spi, xfer, timeout);
 
if (ret)
goto exit;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 14/18] spi: qup: allow block mode to generate multiple transactions

2017-06-13 Thread Varadarajan Narayanan

This let's you write more to the SPI bus than 64K-1 which is important
if the block size of a SPI device is >= 64K or some other device wants
to do something larger.

This has the benefit of completely removing spi_message from the spi-qup
transactions

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 127 +++---
 1 file changed, 80 insertions(+), 47 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index f085e36..02d4e10 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -120,7 +120,7 @@
 
 #define SPI_NUM_CHIPSELECTS4
 
-#define SPI_MAX_DMA_XFER   (SZ_64K - 64)
+#define SPI_MAX_XFER   (SZ_64K - 64)
 
 /* high speed mode is when bus rate is greater then 26MHz */
 #define SPI_HS_MIN_RATE2600
@@ -151,6 +151,8 @@ struct spi_qup {
int n_words;
int tx_bytes;
int rx_bytes;
+   const u8*tx_buf;
+   u8  *rx_buf;
int qup_v1;
 
int mode;
@@ -175,6 +177,12 @@ static inline bool spi_qup_is_dma_xfer(int mode)
return false;
 }
 
+/* get's the transaction size length */
+static inline unsigned int spi_qup_len(struct spi_qup *controller)
+{
+   return controller->n_words * controller->w_size;
+}
+
 static inline bool spi_qup_is_valid_state(struct spi_qup *controller)
 {
u32 opstate = readl_relaxed(controller->base + QUP_STATE);
@@ -227,10 +235,9 @@ static int spi_qup_set_state(struct spi_qup *controller, 
u32 state)
return 0;
 }
 
-static void spi_qup_read_from_fifo(struct spi_qup *controller,
-   struct spi_transfer *xfer, u32 num_words)
+static void spi_qup_read_from_fifo(struct spi_qup *controller, u32 num_words)
 {
-   u8 *rx_buf = xfer->rx_buf;
+   u8 *rx_buf = controller->rx_buf;
int i, shift, num_bytes;
u32 word;
 
@@ -238,8 +245,9 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller,
 
word = readl_relaxed(controller->base + QUP_INPUT_FIFO);
 
-   num_bytes = min_t(int, xfer->len - controller->rx_bytes,
-   controller->w_size);
+   num_bytes = min_t(int, spi_qup_len(controller) -
+  controller->rx_bytes,
+  controller->w_size);
 
if (!rx_buf) {
controller->rx_bytes += num_bytes;
@@ -260,13 +268,12 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller,
}
 }
 
-static void spi_qup_read(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_read(struct spi_qup *controller)
 {
u32 remainder, words_per_block, num_words;
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
 
-   remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes,
+   remainder = DIV_ROUND_UP(spi_qup_len(controller) - controller->rx_bytes,
 controller->w_size);
words_per_block = controller->in_blk_sz >> 2;
 
@@ -287,7 +294,7 @@ static void spi_qup_read(struct spi_qup *controller,
}
 
/* read up to the maximum transfer size available */
-   spi_qup_read_from_fifo(controller, xfer, num_words);
+   spi_qup_read_from_fifo(controller, num_words);
 
remainder -= num_words;
 
@@ -309,18 +316,18 @@ static void spi_qup_read(struct spi_qup *controller,
 
 }
 
-static void spi_qup_write_to_fifo(struct spi_qup *controller,
-   struct spi_transfer *xfer, u32 num_words)
+static void spi_qup_write_to_fifo(struct spi_qup *controller, u32 num_words)
 {
-   const u8 *tx_buf = xfer->tx_buf;
+   const u8 *tx_buf = controller->tx_buf;
int i, num_bytes;
u32 word, data;
 
for (; num_words; num_words--) {
word = 0;
 
-   num_bytes = min_t(int, xfer->len - controller->tx_bytes,
-   controller->w_size);
+   num_bytes = min_t(int, spi_qup_len(controller) -
+  controller->tx_bytes,
+  controller->w_size);
if (tx_buf)
for (i = 0; i < num_bytes; i++) {
data = tx_buf[controller->tx_bytes + i];
@@ -338,13 +345,12 @@ static void spi_qup_dma_done(void *data)
complete(data);
 }
 
-static void spi_qup_write(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_write(struct spi_qup *controller)
 {
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
u32 remainder,

[PATCH 16/18] spi: qup: allow multiple DMA transactions per spi xfer

2017-06-13 Thread Varadarajan Narayanan

Much like the block mode changes, we are breaking up DMA transactions
into 64K chunks so we can reset the QUP engine.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 105 --
 1 file changed, 77 insertions(+), 28 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index b65a6a4..35e12ba 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -417,51 +417,100 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
+static u32 spi_qup_sgl_get_nents_len(struct scatterlist *sgl, u32 max,
+u32 *nents)
+{
+   struct scatterlist *sg;
+   u32 total = 0;
+
+   *nents = 0;
+
+   for (sg = sgl; sg; sg = sg_next(sg)) {
+   unsigned int len = sg_dma_len(sg);
+
+   /* check for overflow as well as limit */
+   if (((total + len) < total) || ((total + len) > max))
+   break;
+
+   total += len;
+   (*nents)++;
+   }
+
+   return total;
+}
+
 static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
dma_async_tx_callback done = qup->qup_v1 ? NULL : spi_qup_dma_done;
+   struct scatterlist *tx_sgl, *rx_sgl;
int ret;
 
-   ret = spi_qup_io_config(spi, xfer);
-   if (ret)
-   return ret;
+   rx_sgl = xfer->rx_sg.sgl;
+   tx_sgl = xfer->tx_sg.sgl;
 
-   /* before issuing the descriptors, set the QUP to run */
-   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
-   if (ret) {
-   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
-   __func__, __LINE__);
-   return ret;
-   }
+   do {
+   u32 rx_nents, tx_nents;
+
+   if (rx_sgl)
+   qup->n_words = spi_qup_sgl_get_nents_len(rx_sgl,
+   SPI_MAX_XFER, _nents) / qup->w_size;
+   if (tx_sgl)
+   qup->n_words = spi_qup_sgl_get_nents_len(tx_sgl,
+   SPI_MAX_XFER, _nents) / qup->w_size;
+   if (!qup->n_words)
+   return -EIO;
 
-   if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl,
- xfer->rx_sg.nents, DMA_DEV_TO_MEM,
- done, >rxc);
+   ret = spi_qup_io_config(spi, xfer);
if (ret)
return ret;
 
-   dma_async_issue_pending(master->dma_rx);
-   }
-
-   if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl,
- xfer->tx_sg.nents, DMA_MEM_TO_DEV,
- done, >txc);
-   if (ret)
+   /* before issuing the descriptors, set the QUP to run */
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "cannot set RUN state\n");
return ret;
+   }
 
-   dma_async_issue_pending(master->dma_tx);
-   }
+   if (rx_sgl) {
+   ret = spi_qup_prep_sg(master, rx_sgl, rx_nents,
+ DMA_DEV_TO_MEM, done,
+ >rxc);
+   if (ret)
+   return ret;
+   dma_async_issue_pending(master->dma_rx);
+   }
+
+   if (tx_sgl) {
+   ret = spi_qup_prep_sg(master, tx_sgl, tx_nents,
+ DMA_MEM_TO_DEV, done,
+ >txc);
+   if (ret)
+   return ret;
+
+   dma_async_issue_pending(master->dma_tx);
+   }
+
+   if (rx_sgl &&
+   !wait_for_completion_timeout(>rxc, timeout)) {
+   pr_emerg(" rx timed out\n");
+   return -ETIMEDOUT;
+   }
+
+   if (tx_sgl &&
+   !wait_for_completion_timeout(>txc, timeout)) {
+   pr_emerg(" tx timed out\n");
+   return -ETIMEDOUT;
+   }
 
-   if (xfer->rx_buf && !wait_for_completion_timeout(>rxc, timeout))
-   return -ETIMEDOUT;
+   for (; rx_sgl && rx_nents--; rx_sgl = sg_next(rx_sgl))
+   ;
+   for (; tx_sgl && tx_nents--; tx_sgl

[PATCH 14/18] spi: qup: allow block mode to generate multiple transactions

2017-06-13 Thread Varadarajan Narayanan

This let's you write more to the SPI bus than 64K-1 which is important
if the block size of a SPI device is >= 64K or some other device wants
to do something larger.

This has the benefit of completely removing spi_message from the spi-qup
transactions

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 127 +++---
 1 file changed, 80 insertions(+), 47 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index f085e36..02d4e10 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -120,7 +120,7 @@
 
 #define SPI_NUM_CHIPSELECTS4
 
-#define SPI_MAX_DMA_XFER   (SZ_64K - 64)
+#define SPI_MAX_XFER   (SZ_64K - 64)
 
 /* high speed mode is when bus rate is greater then 26MHz */
 #define SPI_HS_MIN_RATE2600
@@ -151,6 +151,8 @@ struct spi_qup {
int n_words;
int tx_bytes;
int rx_bytes;
+   const u8*tx_buf;
+   u8  *rx_buf;
int qup_v1;
 
int mode;
@@ -175,6 +177,12 @@ static inline bool spi_qup_is_dma_xfer(int mode)
return false;
 }
 
+/* get's the transaction size length */
+static inline unsigned int spi_qup_len(struct spi_qup *controller)
+{
+   return controller->n_words * controller->w_size;
+}
+
 static inline bool spi_qup_is_valid_state(struct spi_qup *controller)
 {
u32 opstate = readl_relaxed(controller->base + QUP_STATE);
@@ -227,10 +235,9 @@ static int spi_qup_set_state(struct spi_qup *controller, 
u32 state)
return 0;
 }
 
-static void spi_qup_read_from_fifo(struct spi_qup *controller,
-   struct spi_transfer *xfer, u32 num_words)
+static void spi_qup_read_from_fifo(struct spi_qup *controller, u32 num_words)
 {
-   u8 *rx_buf = xfer->rx_buf;
+   u8 *rx_buf = controller->rx_buf;
int i, shift, num_bytes;
u32 word;
 
@@ -238,8 +245,9 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller,
 
word = readl_relaxed(controller->base + QUP_INPUT_FIFO);
 
-   num_bytes = min_t(int, xfer->len - controller->rx_bytes,
-   controller->w_size);
+   num_bytes = min_t(int, spi_qup_len(controller) -
+  controller->rx_bytes,
+  controller->w_size);
 
if (!rx_buf) {
controller->rx_bytes += num_bytes;
@@ -260,13 +268,12 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller,
}
 }
 
-static void spi_qup_read(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_read(struct spi_qup *controller)
 {
u32 remainder, words_per_block, num_words;
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
 
-   remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes,
+   remainder = DIV_ROUND_UP(spi_qup_len(controller) - controller->rx_bytes,
 controller->w_size);
words_per_block = controller->in_blk_sz >> 2;
 
@@ -287,7 +294,7 @@ static void spi_qup_read(struct spi_qup *controller,
}
 
/* read up to the maximum transfer size available */
-   spi_qup_read_from_fifo(controller, xfer, num_words);
+   spi_qup_read_from_fifo(controller, num_words);
 
remainder -= num_words;
 
@@ -309,18 +316,18 @@ static void spi_qup_read(struct spi_qup *controller,
 
 }
 
-static void spi_qup_write_to_fifo(struct spi_qup *controller,
-   struct spi_transfer *xfer, u32 num_words)
+static void spi_qup_write_to_fifo(struct spi_qup *controller, u32 num_words)
 {
-   const u8 *tx_buf = xfer->tx_buf;
+   const u8 *tx_buf = controller->tx_buf;
int i, num_bytes;
u32 word, data;
 
for (; num_words; num_words--) {
word = 0;
 
-   num_bytes = min_t(int, xfer->len - controller->tx_bytes,
-   controller->w_size);
+   num_bytes = min_t(int, spi_qup_len(controller) -
+  controller->tx_bytes,
+  controller->w_size);
if (tx_buf)
for (i = 0; i < num_bytes; i++) {
data = tx_buf[controller->tx_bytes + i];
@@ -338,13 +345,12 @@ static void spi_qup_dma_done(void *data)
complete(data);
 }
 
-static void spi_qup_write(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_write(struct spi_qup *controller)
 {
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
u32 remainder, words_per_block, num_words;
 
-   remainder =

[PATCH 16/18] spi: qup: allow multiple DMA transactions per spi xfer

2017-06-13 Thread Varadarajan Narayanan

Much like the block mode changes, we are breaking up DMA transactions
into 64K chunks so we can reset the QUP engine.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 105 --
 1 file changed, 77 insertions(+), 28 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index b65a6a4..35e12ba 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -417,51 +417,100 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
+static u32 spi_qup_sgl_get_nents_len(struct scatterlist *sgl, u32 max,
+u32 *nents)
+{
+   struct scatterlist *sg;
+   u32 total = 0;
+
+   *nents = 0;
+
+   for (sg = sgl; sg; sg = sg_next(sg)) {
+   unsigned int len = sg_dma_len(sg);
+
+   /* check for overflow as well as limit */
+   if (((total + len) < total) || ((total + len) > max))
+   break;
+
+   total += len;
+   (*nents)++;
+   }
+
+   return total;
+}
+
 static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
dma_async_tx_callback done = qup->qup_v1 ? NULL : spi_qup_dma_done;
+   struct scatterlist *tx_sgl, *rx_sgl;
int ret;
 
-   ret = spi_qup_io_config(spi, xfer);
-   if (ret)
-   return ret;
+   rx_sgl = xfer->rx_sg.sgl;
+   tx_sgl = xfer->tx_sg.sgl;
 
-   /* before issuing the descriptors, set the QUP to run */
-   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
-   if (ret) {
-   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
-   __func__, __LINE__);
-   return ret;
-   }
+   do {
+   u32 rx_nents, tx_nents;
+
+   if (rx_sgl)
+   qup->n_words = spi_qup_sgl_get_nents_len(rx_sgl,
+   SPI_MAX_XFER, _nents) / qup->w_size;
+   if (tx_sgl)
+   qup->n_words = spi_qup_sgl_get_nents_len(tx_sgl,
+   SPI_MAX_XFER, _nents) / qup->w_size;
+   if (!qup->n_words)
+   return -EIO;
 
-   if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl,
- xfer->rx_sg.nents, DMA_DEV_TO_MEM,
- done, >rxc);
+   ret = spi_qup_io_config(spi, xfer);
if (ret)
return ret;
 
-   dma_async_issue_pending(master->dma_rx);
-   }
-
-   if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl,
- xfer->tx_sg.nents, DMA_MEM_TO_DEV,
- done, >txc);
-   if (ret)
+   /* before issuing the descriptors, set the QUP to run */
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "cannot set RUN state\n");
return ret;
+   }
 
-   dma_async_issue_pending(master->dma_tx);
-   }
+   if (rx_sgl) {
+   ret = spi_qup_prep_sg(master, rx_sgl, rx_nents,
+ DMA_DEV_TO_MEM, done,
+ >rxc);
+   if (ret)
+   return ret;
+   dma_async_issue_pending(master->dma_rx);
+   }
+
+   if (tx_sgl) {
+   ret = spi_qup_prep_sg(master, tx_sgl, tx_nents,
+ DMA_MEM_TO_DEV, done,
+ >txc);
+   if (ret)
+   return ret;
+
+   dma_async_issue_pending(master->dma_tx);
+   }
+
+   if (rx_sgl &&
+   !wait_for_completion_timeout(>rxc, timeout)) {
+   pr_emerg(" rx timed out\n");
+   return -ETIMEDOUT;
+   }
+
+   if (tx_sgl &&
+   !wait_for_completion_timeout(>txc, timeout)) {
+   pr_emerg(" tx timed out\n");
+   return -ETIMEDOUT;
+   }
 
-   if (xfer->rx_buf && !wait_for_completion_timeout(>rxc, timeout))
-   return -ETIMEDOUT;
+   for (; rx_sgl && rx_nents--; rx_sgl = sg_next(rx_sgl))
+   ;
+   for (; tx_sgl && tx_nents--; tx_sgl = sg_next(tx_sgl))
+   ;

[PATCH 18/18] spi: qup: support for qup v1 dma

2017-06-13 Thread Varadarajan Narayanan

Currently the QUP Version v1 does not work with DMA so added
the support for the same.

1. It uses ADM DMA which requires TX and RX CRCI
2. DMA channel initialization need to be done after setting
   block size for having valid values in maxburst
3. QUP mode should be DMOV instead of BAM.

Signed-off-by: Abhishek Sahu 
Signed-off-by: Varadarajan Narayanan 
---
 .../devicetree/bindings/spi/qcom,spi-qup.txt   |  6 
 drivers/spi/spi-qup.c  | 35 +-
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt 
b/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt
index 5c09077..e754181 100644
--- a/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt
+++ b/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt
@@ -38,6 +38,12 @@ Optional properties:
 - dma-names:Names for the dma channels, if present. There must be at
 least one channel named "tx" for transmit and named "rx" for
 receive.
+- qcom,tx-crci: Identificator for Client Rate Control Interface (CRCI) to be
+   used with TX DMA channel. Required when using DMA for
+   transmission with QUP Version 1 i.e qcom,spi-qup-v1.1.1.
+- qcom,rx-crci: Identificator for Client Rate Control Interface (CRCI) to be
+   used with RX DMA channel. Required when using DMA for
+   receiving with QUP Version 1 i.e qcom,spi-qup-v1.1.1.
 
 SPI slave nodes must be children of the SPI master node and can contain
 properties described in Documentation/devicetree/bindings/spi/spi-bus.txt
diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 4ef9301..10666e7 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -691,7 +691,8 @@ static int spi_qup_io_prep(struct spi_device *spi, struct 
spi_transfer *xfer)
else if (spi->master->can_dma &&
 spi->master->can_dma(spi->master, spi, xfer) &&
 spi->master->cur_msg_mapped)
-   controller->mode = QUP_IO_M_MODE_BAM;
+   controller->mode = controller->qup_v1 ? QUP_IO_M_MODE_DMOV :
+   QUP_IO_M_MODE_BAM;
else
controller->mode = QUP_IO_M_MODE_BLOCK;
 
@@ -730,6 +731,7 @@ static int spi_qup_io_config(struct spi_device *spi, struct 
spi_transfer *xfer)
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
break;
case QUP_IO_M_MODE_BAM:
+   case QUP_IO_M_MODE_DMOV:
reinit_completion(>txc);
reinit_completion(>rxc);
writel_relaxed(controller->n_words,
@@ -934,6 +936,7 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
struct dma_slave_config *rx_conf = >rx_conf,
*tx_conf = >tx_conf;
struct device *dev = spi->dev;
+   u32 tx_crci = 0, rx_crci = 0;
int ret;
 
/* allocate dma resources, if available */
@@ -947,16 +950,34 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
goto err_tx;
}
 
+   if (spi->qup_v1) {
+   ret = of_property_read_u32(dev->of_node, "qcom,tx-crci",
+  _crci);
+   if (ret) {
+   dev_err(dev, "missing property qcom,tx-crci\n");
+   goto err;
+   }
+
+   ret = of_property_read_u32(dev->of_node, "qcom,rx-crci",
+  _crci);
+   if (ret) {
+   dev_err(dev, "missing property qcom,rx-crci\n");
+   goto err;
+   }
+   }
+
/* set DMA parameters */
rx_conf->direction = DMA_DEV_TO_MEM;
rx_conf->device_fc = 1;
rx_conf->src_addr = base + QUP_INPUT_FIFO;
rx_conf->src_maxburst = spi->in_blk_sz;
+   rx_conf->slave_id = rx_crci;
 
tx_conf->direction = DMA_MEM_TO_DEV;
tx_conf->device_fc = 1;
tx_conf->dst_addr = base + QUP_OUTPUT_FIFO;
tx_conf->dst_maxburst = spi->out_blk_sz;
+   tx_conf->slave_id = tx_crci;
 
ret = dmaengine_slave_config(master->dma_rx, rx_conf);
if (ret) {
@@ -1083,12 +1104,6 @@ static int spi_qup_probe(struct platform_device *pdev)
controller->cclk = cclk;
controller->irq = irq;
 
-   ret = spi_qup_init_dma(master, res->start);
-   if (ret == -EPROBE_DEFER)
-   goto error;
-   else if (!ret)
-   master->can_dma = spi_qup_can_dma;
-
/* set v1 flag if device is version 1 */
if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1"))
controller->qup_v1 = 1;
@@ -1125,6 +1140,12 @@ static int spi_qup_probe(struct platform_device *pdev)

[PATCH 15/18] spi: qup: refactor spi_qup_prep_sg

2017-06-13 Thread Varadarajan Narayanan

Take specific sgl and nent to be prepared.  This is in
preparation for splitting DMA into multiple transacations, this
contains no code changes just refactoring.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 02d4e10..b65a6a4 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -382,26 +382,19 @@ static void spi_qup_write(struct spi_qup *controller)
} while (remainder);
 }
 
-static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer 
*xfer,
-  enum dma_transfer_direction dir,
+static int spi_qup_prep_sg(struct spi_master *master, struct scatterlist *sgl,
+  unsigned int nents, enum dma_transfer_direction dir,
   dma_async_tx_callback callback, void *data)
 {
unsigned long flags = DMA_PREP_INTERRUPT | DMA_PREP_FENCE;
struct dma_async_tx_descriptor *desc;
-   struct scatterlist *sgl;
struct dma_chan *chan;
dma_cookie_t cookie;
-   unsigned int nents;
 
-   if (dir == DMA_MEM_TO_DEV) {
+   if (dir == DMA_MEM_TO_DEV)
chan = master->dma_tx;
-   nents = xfer->tx_sg.nents;
-   sgl = xfer->tx_sg.sgl;
-   } else {
+   else
chan = master->dma_rx;
-   nents = xfer->rx_sg.nents;
-   sgl = xfer->rx_sg.sgl;
-   }
 
desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags);
if (IS_ERR_OR_NULL(desc))
@@ -445,8 +438,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
}
 
if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
-   done, >rxc);
+   ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl,
+ xfer->rx_sg.nents, DMA_DEV_TO_MEM,
+ done, >rxc);
if (ret)
return ret;
 
@@ -454,8 +448,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
}
 
if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV,
-   done, >txc);
+   ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl,
+ xfer->tx_sg.nents, DMA_MEM_TO_DEV,
+ done, >txc);
if (ret)
return ret;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 18/18] spi: qup: support for qup v1 dma

2017-06-13 Thread Varadarajan Narayanan

Currently the QUP Version v1 does not work with DMA so added
the support for the same.

1. It uses ADM DMA which requires TX and RX CRCI
2. DMA channel initialization need to be done after setting
   block size for having valid values in maxburst
3. QUP mode should be DMOV instead of BAM.

Signed-off-by: Abhishek Sahu 
Signed-off-by: Varadarajan Narayanan 
---
 .../devicetree/bindings/spi/qcom,spi-qup.txt   |  6 
 drivers/spi/spi-qup.c  | 35 +-
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt 
b/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt
index 5c09077..e754181 100644
--- a/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt
+++ b/Documentation/devicetree/bindings/spi/qcom,spi-qup.txt
@@ -38,6 +38,12 @@ Optional properties:
 - dma-names:Names for the dma channels, if present. There must be at
 least one channel named "tx" for transmit and named "rx" for
 receive.
+- qcom,tx-crci: Identificator for Client Rate Control Interface (CRCI) to be
+   used with TX DMA channel. Required when using DMA for
+   transmission with QUP Version 1 i.e qcom,spi-qup-v1.1.1.
+- qcom,rx-crci: Identificator for Client Rate Control Interface (CRCI) to be
+   used with RX DMA channel. Required when using DMA for
+   receiving with QUP Version 1 i.e qcom,spi-qup-v1.1.1.
 
 SPI slave nodes must be children of the SPI master node and can contain
 properties described in Documentation/devicetree/bindings/spi/spi-bus.txt
diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 4ef9301..10666e7 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -691,7 +691,8 @@ static int spi_qup_io_prep(struct spi_device *spi, struct 
spi_transfer *xfer)
else if (spi->master->can_dma &&
 spi->master->can_dma(spi->master, spi, xfer) &&
 spi->master->cur_msg_mapped)
-   controller->mode = QUP_IO_M_MODE_BAM;
+   controller->mode = controller->qup_v1 ? QUP_IO_M_MODE_DMOV :
+   QUP_IO_M_MODE_BAM;
else
controller->mode = QUP_IO_M_MODE_BLOCK;
 
@@ -730,6 +731,7 @@ static int spi_qup_io_config(struct spi_device *spi, struct 
spi_transfer *xfer)
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
break;
case QUP_IO_M_MODE_BAM:
+   case QUP_IO_M_MODE_DMOV:
reinit_completion(>txc);
reinit_completion(>rxc);
writel_relaxed(controller->n_words,
@@ -934,6 +936,7 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
struct dma_slave_config *rx_conf = >rx_conf,
*tx_conf = >tx_conf;
struct device *dev = spi->dev;
+   u32 tx_crci = 0, rx_crci = 0;
int ret;
 
/* allocate dma resources, if available */
@@ -947,16 +950,34 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
goto err_tx;
}
 
+   if (spi->qup_v1) {
+   ret = of_property_read_u32(dev->of_node, "qcom,tx-crci",
+  _crci);
+   if (ret) {
+   dev_err(dev, "missing property qcom,tx-crci\n");
+   goto err;
+   }
+
+   ret = of_property_read_u32(dev->of_node, "qcom,rx-crci",
+  _crci);
+   if (ret) {
+   dev_err(dev, "missing property qcom,rx-crci\n");
+   goto err;
+   }
+   }
+
/* set DMA parameters */
rx_conf->direction = DMA_DEV_TO_MEM;
rx_conf->device_fc = 1;
rx_conf->src_addr = base + QUP_INPUT_FIFO;
rx_conf->src_maxburst = spi->in_blk_sz;
+   rx_conf->slave_id = rx_crci;
 
tx_conf->direction = DMA_MEM_TO_DEV;
tx_conf->device_fc = 1;
tx_conf->dst_addr = base + QUP_OUTPUT_FIFO;
tx_conf->dst_maxburst = spi->out_blk_sz;
+   tx_conf->slave_id = tx_crci;
 
ret = dmaengine_slave_config(master->dma_rx, rx_conf);
if (ret) {
@@ -1083,12 +1104,6 @@ static int spi_qup_probe(struct platform_device *pdev)
controller->cclk = cclk;
controller->irq = irq;
 
-   ret = spi_qup_init_dma(master, res->start);
-   if (ret == -EPROBE_DEFER)
-   goto error;
-   else if (!ret)
-   master->can_dma = spi_qup_can_dma;
-
/* set v1 flag if device is version 1 */
if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1"))
controller->qup_v1 = 1;
@@ -1125,6 +1140,12 @@ static int spi_qup_probe(struct platform_device *pdev)
 controller->in_blk_sz, controller->in_fifo_sz,

[PATCH 15/18] spi: qup: refactor spi_qup_prep_sg

2017-06-13 Thread Varadarajan Narayanan

Take specific sgl and nent to be prepared.  This is in
preparation for splitting DMA into multiple transacations, this
contains no code changes just refactoring.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 02d4e10..b65a6a4 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -382,26 +382,19 @@ static void spi_qup_write(struct spi_qup *controller)
} while (remainder);
 }
 
-static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer 
*xfer,
-  enum dma_transfer_direction dir,
+static int spi_qup_prep_sg(struct spi_master *master, struct scatterlist *sgl,
+  unsigned int nents, enum dma_transfer_direction dir,
   dma_async_tx_callback callback, void *data)
 {
unsigned long flags = DMA_PREP_INTERRUPT | DMA_PREP_FENCE;
struct dma_async_tx_descriptor *desc;
-   struct scatterlist *sgl;
struct dma_chan *chan;
dma_cookie_t cookie;
-   unsigned int nents;
 
-   if (dir == DMA_MEM_TO_DEV) {
+   if (dir == DMA_MEM_TO_DEV)
chan = master->dma_tx;
-   nents = xfer->tx_sg.nents;
-   sgl = xfer->tx_sg.sgl;
-   } else {
+   else
chan = master->dma_rx;
-   nents = xfer->rx_sg.nents;
-   sgl = xfer->rx_sg.sgl;
-   }
 
desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags);
if (IS_ERR_OR_NULL(desc))
@@ -445,8 +438,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
}
 
if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
-   done, >rxc);
+   ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl,
+ xfer->rx_sg.nents, DMA_DEV_TO_MEM,
+ done, >rxc);
if (ret)
return ret;
 
@@ -454,8 +448,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
}
 
if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV,
-   done, >txc);
+   ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl,
+ xfer->tx_sg.nents, DMA_MEM_TO_DEV,
+ done, >txc);
if (ret)
return ret;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 09/18] spi: qup: Do block sized read/write in block mode

2017-06-13 Thread Varadarajan Narayanan

This patch corrects the behavior of the BLOCK
transactions.  During block transactions, the controller
must be read/written to in block size transactions.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 151 +++---
 1 file changed, 119 insertions(+), 32 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 0f6a4c7..872de28 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -82,6 +82,8 @@
 #define QUP_IO_M_MODE_BAM  3
 
 /* QUP_OPERATIONAL fields */
+#define QUP_OP_IN_BLOCK_READ_REQ   BIT(13)
+#define QUP_OP_OUT_BLOCK_WRITE_REQ BIT(12)
 #define QUP_OP_MAX_INPUT_DONE_FLAG BIT(11)
 #define QUP_OP_MAX_OUTPUT_DONE_FLAGBIT(10)
 #define QUP_OP_IN_SERVICE_FLAG BIT(9)
@@ -156,6 +158,13 @@ struct spi_qup {
struct dma_slave_config tx_conf;
 };
 
+static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag)
+{
+   u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL);
+
+   return (opflag & flag) != 0;
+}
+
 static inline bool spi_qup_is_dma_xfer(int mode)
 {
if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM)
@@ -216,29 +225,26 @@ static int spi_qup_set_state(struct spi_qup *controller, 
u32 state)
return 0;
 }
 
-static void spi_qup_fifo_read(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_read_from_fifo(struct spi_qup *controller,
+   struct spi_transfer *xfer, u32 num_words)
 {
u8 *rx_buf = xfer->rx_buf;
-   u32 word, state;
-   int idx, shift, w_size;
+   int i, shift, num_bytes;
+   u32 word;
 
-   w_size = controller->w_size;
-
-   while (controller->rx_bytes < xfer->len) {
-
-   state = readl_relaxed(controller->base + QUP_OPERATIONAL);
-   if (0 == (state & QUP_OP_IN_FIFO_NOT_EMPTY))
-   break;
+   for (; num_words; num_words--) {
 
word = readl_relaxed(controller->base + QUP_INPUT_FIFO);
 
+   num_bytes = min_t(int, xfer->len - controller->rx_bytes,
+   controller->w_size);
+
if (!rx_buf) {
-   controller->rx_bytes += w_size;
+   controller->rx_bytes += num_bytes;
continue;
}
 
-   for (idx = 0; idx < w_size; idx++, controller->rx_bytes++) {
+   for (i = 0; i < num_bytes; i++, controller->rx_bytes++) {
/*
 * The data format depends on bytes per SPI word:
 *  4 bytes: 0x12345678
@@ -246,38 +252,80 @@ static void spi_qup_fifo_read(struct spi_qup *controller,
 *  1 byte : 0x0012
 */
shift = BITS_PER_BYTE;
-   shift *= (w_size - idx - 1);
+   shift *= (controller->w_size - i - 1);
rx_buf[controller->rx_bytes] = word >> shift;
}
}
 }
 
-static void spi_qup_fifo_write(struct spi_qup *controller,
+static void spi_qup_read(struct spi_qup *controller,
struct spi_transfer *xfer)
 {
-   const u8 *tx_buf = xfer->tx_buf;
-   u32 word, state, data;
-   int idx, w_size;
+   u32 remainder, words_per_block, num_words;
+   bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
+
+   remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes,
+controller->w_size);
+   words_per_block = controller->in_blk_sz >> 2;
+
+   do {
+   /* ACK by clearing service flag */
+   writel_relaxed(QUP_OP_IN_SERVICE_FLAG,
+  controller->base + QUP_OPERATIONAL);
+
+   if (is_block_mode) {
+   num_words = (remainder > words_per_block) ?
+   words_per_block : remainder;
+   } else {
+   if (!spi_qup_is_flag_set(controller,
+QUP_OP_IN_FIFO_NOT_EMPTY))
+   break;
 
-   w_size = controller->w_size;
+   num_words = 1;
+   }
 
-   while (controller->tx_bytes < xfer->len) {
+   /* read up to the maximum transfer size available */
+   spi_qup_read_from_fifo(controller, xfer, num_words);
 
-   state = readl_relaxed(controller->base + QUP_OPERATIONAL);
-   if (state & QUP_OP_OUT_FIFO_FULL)
+   remainder -= num_words;
+
+   /* if block mode, check to see if next block is available */
+   if (is_block_mode && !spi_qup_is_flag_set(controller,
+

[PATCH 09/18] spi: qup: Do block sized read/write in block mode

2017-06-13 Thread Varadarajan Narayanan

This patch corrects the behavior of the BLOCK
transactions.  During block transactions, the controller
must be read/written to in block size transactions.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 151 +++---
 1 file changed, 119 insertions(+), 32 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 0f6a4c7..872de28 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -82,6 +82,8 @@
 #define QUP_IO_M_MODE_BAM  3
 
 /* QUP_OPERATIONAL fields */
+#define QUP_OP_IN_BLOCK_READ_REQ   BIT(13)
+#define QUP_OP_OUT_BLOCK_WRITE_REQ BIT(12)
 #define QUP_OP_MAX_INPUT_DONE_FLAG BIT(11)
 #define QUP_OP_MAX_OUTPUT_DONE_FLAGBIT(10)
 #define QUP_OP_IN_SERVICE_FLAG BIT(9)
@@ -156,6 +158,13 @@ struct spi_qup {
struct dma_slave_config tx_conf;
 };
 
+static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag)
+{
+   u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL);
+
+   return (opflag & flag) != 0;
+}
+
 static inline bool spi_qup_is_dma_xfer(int mode)
 {
if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM)
@@ -216,29 +225,26 @@ static int spi_qup_set_state(struct spi_qup *controller, 
u32 state)
return 0;
 }
 
-static void spi_qup_fifo_read(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_read_from_fifo(struct spi_qup *controller,
+   struct spi_transfer *xfer, u32 num_words)
 {
u8 *rx_buf = xfer->rx_buf;
-   u32 word, state;
-   int idx, shift, w_size;
+   int i, shift, num_bytes;
+   u32 word;
 
-   w_size = controller->w_size;
-
-   while (controller->rx_bytes < xfer->len) {
-
-   state = readl_relaxed(controller->base + QUP_OPERATIONAL);
-   if (0 == (state & QUP_OP_IN_FIFO_NOT_EMPTY))
-   break;
+   for (; num_words; num_words--) {
 
word = readl_relaxed(controller->base + QUP_INPUT_FIFO);
 
+   num_bytes = min_t(int, xfer->len - controller->rx_bytes,
+   controller->w_size);
+
if (!rx_buf) {
-   controller->rx_bytes += w_size;
+   controller->rx_bytes += num_bytes;
continue;
}
 
-   for (idx = 0; idx < w_size; idx++, controller->rx_bytes++) {
+   for (i = 0; i < num_bytes; i++, controller->rx_bytes++) {
/*
 * The data format depends on bytes per SPI word:
 *  4 bytes: 0x12345678
@@ -246,38 +252,80 @@ static void spi_qup_fifo_read(struct spi_qup *controller,
 *  1 byte : 0x0012
 */
shift = BITS_PER_BYTE;
-   shift *= (w_size - idx - 1);
+   shift *= (controller->w_size - i - 1);
rx_buf[controller->rx_bytes] = word >> shift;
}
}
 }
 
-static void spi_qup_fifo_write(struct spi_qup *controller,
+static void spi_qup_read(struct spi_qup *controller,
struct spi_transfer *xfer)
 {
-   const u8 *tx_buf = xfer->tx_buf;
-   u32 word, state, data;
-   int idx, w_size;
+   u32 remainder, words_per_block, num_words;
+   bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
+
+   remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes,
+controller->w_size);
+   words_per_block = controller->in_blk_sz >> 2;
+
+   do {
+   /* ACK by clearing service flag */
+   writel_relaxed(QUP_OP_IN_SERVICE_FLAG,
+  controller->base + QUP_OPERATIONAL);
+
+   if (is_block_mode) {
+   num_words = (remainder > words_per_block) ?
+   words_per_block : remainder;
+   } else {
+   if (!spi_qup_is_flag_set(controller,
+QUP_OP_IN_FIFO_NOT_EMPTY))
+   break;
 
-   w_size = controller->w_size;
+   num_words = 1;
+   }
 
-   while (controller->tx_bytes < xfer->len) {
+   /* read up to the maximum transfer size available */
+   spi_qup_read_from_fifo(controller, xfer, num_words);
 
-   state = readl_relaxed(controller->base + QUP_OPERATIONAL);
-   if (state & QUP_OP_OUT_FIFO_FULL)
+   remainder -= num_words;
+
+   /* if block mode, check to see if next block is available */
+   if (is_block_mode && !spi_qup_is_flag_set(controller,
+   QUP_OP_IN_BLOCK_READ_REQ))
break;

[PATCH 10/18] spi: qup: Fix DMA mode interrupt handling

2017-06-13 Thread Varadarajan Narayanan

This is needed for v1, where the i/o completion is not
handled in the dma driver.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 872de28..bd53e82 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -510,9 +510,9 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
 
writel_relaxed(qup_err, controller->base + QUP_ERROR_FLAGS);
writel_relaxed(spi_err, controller->base + SPI_ERROR_FLAGS);
-   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
 
if (!xfer) {
+   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
dev_err_ratelimited(controller->dev, "unexpected irq %08x %08x 
%08x\n",
qup_err, spi_err, opflags);
return IRQ_HANDLED;
@@ -540,7 +540,15 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
error = -EIO;
}
 
-   if (!spi_qup_is_dma_xfer(controller->mode)) {
+   if (spi_qup_is_dma_xfer(controller->mode)) {
+   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
+   if (opflags & QUP_OP_IN_SERVICE_FLAG &&
+   opflags & QUP_OP_MAX_INPUT_DONE_FLAG)
+   complete(>rxc);
+   if (opflags & QUP_OP_OUT_SERVICE_FLAG &&
+   opflags & QUP_OP_MAX_OUTPUT_DONE_FLAG)
+   complete(>txc);
+   } else {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
spi_qup_read(controller, xfer);
 
@@ -553,6 +561,9 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
controller->xfer = xfer;
spin_unlock_irqrestore(>lock, flags);
 
+   /* re-read opflags as flags may have changed due to actions above */
+   opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
+
if ((controller->rx_bytes == xfer->len &&
(opflags & QUP_OP_MAX_INPUT_DONE_FLAG)) ||  error)
complete(>done);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 10/18] spi: qup: Fix DMA mode interrupt handling

2017-06-13 Thread Varadarajan Narayanan

This is needed for v1, where the i/o completion is not
handled in the dma driver.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 872de28..bd53e82 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -510,9 +510,9 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
 
writel_relaxed(qup_err, controller->base + QUP_ERROR_FLAGS);
writel_relaxed(spi_err, controller->base + SPI_ERROR_FLAGS);
-   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
 
if (!xfer) {
+   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
dev_err_ratelimited(controller->dev, "unexpected irq %08x %08x 
%08x\n",
qup_err, spi_err, opflags);
return IRQ_HANDLED;
@@ -540,7 +540,15 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
error = -EIO;
}
 
-   if (!spi_qup_is_dma_xfer(controller->mode)) {
+   if (spi_qup_is_dma_xfer(controller->mode)) {
+   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
+   if (opflags & QUP_OP_IN_SERVICE_FLAG &&
+   opflags & QUP_OP_MAX_INPUT_DONE_FLAG)
+   complete(>rxc);
+   if (opflags & QUP_OP_OUT_SERVICE_FLAG &&
+   opflags & QUP_OP_MAX_OUTPUT_DONE_FLAG)
+   complete(>txc);
+   } else {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
spi_qup_read(controller, xfer);
 
@@ -553,6 +561,9 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
controller->xfer = xfer;
spin_unlock_irqrestore(>lock, flags);
 
+   /* re-read opflags as flags may have changed due to actions above */
+   opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
+
if ((controller->rx_bytes == xfer->len &&
(opflags & QUP_OP_MAX_INPUT_DONE_FLAG)) ||  error)
complete(>done);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 08/18] spi: qup: Handle v1 dma completion differently

2017-06-13 Thread Varadarajan Narayanan

Do not assign i/o completion callbacks while running
on v1 of QUP.

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 7c22ee4..0f6a4c7 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -334,6 +334,7 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer,
  unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
+   dma_async_tx_callback done = qup->qup_v1 ? NULL : spi_qup_dma_done;
int ret;
 
/* before issuing the descriptors, set the QUP to run */
@@ -346,7 +347,7 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer,
 
if (xfer->rx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
-   spi_qup_dma_done, >rxc);
+   done, >rxc);
if (ret)
return ret;
 
@@ -355,7 +356,7 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer,
 
if (xfer->tx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV,
-   spi_qup_dma_done, >txc);
+   done, >txc);
if (ret)
return ret;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 06/18] spi: qup: Fix error handling in spi_qup_prep_sg

2017-06-13 Thread Varadarajan Narayanan

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 363bd43..2124815 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -310,8 +310,8 @@ static int spi_qup_prep_sg(struct spi_master *master, 
struct spi_transfer *xfer,
}
 
desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags);
-   if (!desc)
-   return -EINVAL;
+   if (IS_ERR_OR_NULL(desc))
+   return desc ? PTR_ERR(desc) : -EINVAL;
 
desc->callback = callback;
desc->callback_param = data;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 05/18] spi: qup: Place the QUP in run mode before DMA transactions

2017-06-13 Thread Varadarajan Narayanan

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index d3ccf53..363bd43 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -336,6 +336,14 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
+   /* before issuing the descriptors, set the QUP to run */
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
+   __func__, __LINE__);
+   return ret;
+   }
+
if (xfer->rx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
spi_qup_dma_done, >rxc);
@@ -371,18 +379,24 @@ static int spi_qup_do_pio(struct spi_master *master, 
struct spi_transfer *xfer,
 
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
-   dev_warn(qup->dev, "cannot set RUN state\n");
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n");
return ret;
}
 
ret = spi_qup_set_state(qup, QUP_STATE_PAUSE);
if (ret) {
-   dev_warn(qup->dev, "cannot set PAUSE state\n");
+   dev_warn(qup->dev, "%s(%d): cannot set PAUSE state\n");
return ret;
}
 
spi_qup_fifo_write(qup, xfer);
 
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n");
+   return ret;
+   }
+
if (!wait_for_completion_timeout(>done, timeout))
return -ETIMEDOUT;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 08/18] spi: qup: Handle v1 dma completion differently

2017-06-13 Thread Varadarajan Narayanan

Do not assign i/o completion callbacks while running
on v1 of QUP.

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 7c22ee4..0f6a4c7 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -334,6 +334,7 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer,
  unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
+   dma_async_tx_callback done = qup->qup_v1 ? NULL : spi_qup_dma_done;
int ret;
 
/* before issuing the descriptors, set the QUP to run */
@@ -346,7 +347,7 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer,
 
if (xfer->rx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
-   spi_qup_dma_done, >rxc);
+   done, >rxc);
if (ret)
return ret;
 
@@ -355,7 +356,7 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer,
 
if (xfer->tx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV,
-   spi_qup_dma_done, >txc);
+   done, >txc);
if (ret)
return ret;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 06/18] spi: qup: Fix error handling in spi_qup_prep_sg

2017-06-13 Thread Varadarajan Narayanan

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 363bd43..2124815 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -310,8 +310,8 @@ static int spi_qup_prep_sg(struct spi_master *master, 
struct spi_transfer *xfer,
}
 
desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags);
-   if (!desc)
-   return -EINVAL;
+   if (IS_ERR_OR_NULL(desc))
+   return desc ? PTR_ERR(desc) : -EINVAL;
 
desc->callback = callback;
desc->callback_param = data;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 05/18] spi: qup: Place the QUP in run mode before DMA transactions

2017-06-13 Thread Varadarajan Narayanan

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index d3ccf53..363bd43 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -336,6 +336,14 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
+   /* before issuing the descriptors, set the QUP to run */
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
+   __func__, __LINE__);
+   return ret;
+   }
+
if (xfer->rx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM,
spi_qup_dma_done, >rxc);
@@ -371,18 +379,24 @@ static int spi_qup_do_pio(struct spi_master *master, 
struct spi_transfer *xfer,
 
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
-   dev_warn(qup->dev, "cannot set RUN state\n");
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n");
return ret;
}
 
ret = spi_qup_set_state(qup, QUP_STATE_PAUSE);
if (ret) {
-   dev_warn(qup->dev, "cannot set PAUSE state\n");
+   dev_warn(qup->dev, "%s(%d): cannot set PAUSE state\n");
return ret;
}
 
spi_qup_fifo_write(qup, xfer);
 
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n");
+   return ret;
+   }
+
if (!wait_for_completion_timeout(>done, timeout))
return -ETIMEDOUT;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 04/18] spi: qup: Add completion timeout for fifo/block mode

2017-06-13 Thread Varadarajan Narayanan

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 272e48e..d3ccf53 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -330,7 +330,8 @@ static void spi_qup_dma_terminate(struct spi_master *master,
dmaengine_terminate_all(master->dma_rx);
 }
 
-static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer)
+static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer,
+ unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
@@ -362,7 +363,8 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer)
return 0;
 }
 
-static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer)
+static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer,
+ unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
@@ -381,6 +383,9 @@ static int spi_qup_do_pio(struct spi_master *master, struct 
spi_transfer *xfer)
 
spi_qup_fifo_write(qup, xfer);
 
+   if (!wait_for_completion_timeout(>done, timeout))
+   return -ETIMEDOUT;
+
return 0;
 }
 
@@ -624,7 +629,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
 
-
spin_lock_irqsave(>lock, flags);
controller->xfer = xfer;
controller->error= 0;
@@ -635,10 +639,10 @@ static int spi_qup_transfer_one(struct spi_master *master,
if (spi_qup_is_dma_xfer(controller->mode)) {
reinit_completion(>rxc);
reinit_completion(>txc);
-   ret = spi_qup_do_dma(master, xfer);
+   ret = spi_qup_do_dma(master, xfer, timeout);
} else {
reinit_completion(>done);
-   ret = spi_qup_do_pio(master, xfer);
+   ret = spi_qup_do_pio(master, xfer, timeout);
}
 
if (ret)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 01/18] spi: qup: Enable chip select support

2017-06-13 Thread Varadarajan Narayanan

Enable chip select support for QUP versions later than v1

Signed-off-by: Sham Muthayyan 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1bfa889..c0d4def 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -750,6 +750,24 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
return ret;
 }
 
+static void spi_qup_set_cs(struct spi_device *spi, bool val)
+{
+   struct spi_qup *controller;
+   u32 spi_ioc;
+   u32 spi_ioc_orig;
+
+   controller = spi_master_get_devdata(spi->master);
+   spi_ioc = readl_relaxed(controller->base + SPI_IO_CONTROL);
+   spi_ioc_orig = spi_ioc;
+   if (!val)
+   spi_ioc |= SPI_IO_C_FORCE_CS;
+   else
+   spi_ioc &= ~SPI_IO_C_FORCE_CS;
+
+   if (spi_ioc != spi_ioc_orig)
+   writel_relaxed(spi_ioc, controller->base + SPI_IO_CONTROL);
+}
+
 static int spi_qup_probe(struct platform_device *pdev)
 {
struct spi_master *master;
@@ -846,6 +864,9 @@ static int spi_qup_probe(struct platform_device *pdev)
if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1"))
controller->qup_v1 = 1;
 
+   if (!controller->qup_v1)
+   master->set_cs = spi_qup_set_cs;
+
spin_lock_init(>lock);
init_completion(>done);
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 04/18] spi: qup: Add completion timeout for fifo/block mode

2017-06-13 Thread Varadarajan Narayanan

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 272e48e..d3ccf53 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -330,7 +330,8 @@ static void spi_qup_dma_terminate(struct spi_master *master,
dmaengine_terminate_all(master->dma_rx);
 }
 
-static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer)
+static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer,
+ unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
@@ -362,7 +363,8 @@ static int spi_qup_do_dma(struct spi_master *master, struct 
spi_transfer *xfer)
return 0;
 }
 
-static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer)
+static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer,
+ unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
@@ -381,6 +383,9 @@ static int spi_qup_do_pio(struct spi_master *master, struct 
spi_transfer *xfer)
 
spi_qup_fifo_write(qup, xfer);
 
+   if (!wait_for_completion_timeout(>done, timeout))
+   return -ETIMEDOUT;
+
return 0;
 }
 
@@ -624,7 +629,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
 
-
spin_lock_irqsave(>lock, flags);
controller->xfer = xfer;
controller->error= 0;
@@ -635,10 +639,10 @@ static int spi_qup_transfer_one(struct spi_master *master,
if (spi_qup_is_dma_xfer(controller->mode)) {
reinit_completion(>rxc);
reinit_completion(>txc);
-   ret = spi_qup_do_dma(master, xfer);
+   ret = spi_qup_do_dma(master, xfer, timeout);
} else {
reinit_completion(>done);
-   ret = spi_qup_do_pio(master, xfer);
+   ret = spi_qup_do_pio(master, xfer, timeout);
}
 
if (ret)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH 01/18] spi: qup: Enable chip select support

2017-06-13 Thread Varadarajan Narayanan

Enable chip select support for QUP versions later than v1

Signed-off-by: Sham Muthayyan 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1bfa889..c0d4def 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -750,6 +750,24 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
return ret;
 }
 
+static void spi_qup_set_cs(struct spi_device *spi, bool val)
+{
+   struct spi_qup *controller;
+   u32 spi_ioc;
+   u32 spi_ioc_orig;
+
+   controller = spi_master_get_devdata(spi->master);
+   spi_ioc = readl_relaxed(controller->base + SPI_IO_CONTROL);
+   spi_ioc_orig = spi_ioc;
+   if (!val)
+   spi_ioc |= SPI_IO_C_FORCE_CS;
+   else
+   spi_ioc &= ~SPI_IO_C_FORCE_CS;
+
+   if (spi_ioc != spi_ioc_orig)
+   writel_relaxed(spi_ioc, controller->base + SPI_IO_CONTROL);
+}
+
 static int spi_qup_probe(struct platform_device *pdev)
 {
struct spi_master *master;
@@ -846,6 +864,9 @@ static int spi_qup_probe(struct platform_device *pdev)
if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1"))
controller->qup_v1 = 1;
 
+   if (!controller->qup_v1)
+   master->set_cs = spi_qup_set_cs;
+
spin_lock_init(>lock);
init_completion(>done);
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

Re: [PATCH v2 1/1] PCI: imx6: Add pcie compliance test option

2017-06-13 Thread Schöfegger Stefan

On Tuesday, June 13, 2017 8:58:47 AM CEST Bjorn Helgaas wrote:
> On Tue, Jun 13, 2017 at 05:43:14AM +, Schöfegger Stefan wrote:
> > On Monday, June 12, 2017 6:49:24 PM CEST Bjorn Helgaas wrote:
> > > On Wed, Jun 07, 2017 at 01:36:11PM +0200, Stefan Schoefegger wrote:
> > > > Link speed must not be limited to gen1 during link test for compliance
> > > > tests
> > > > 
> > > > Signed-off-by: Stefan Schoefegger 
> > > > ---
> > > > 
> > > > Changes since v1:
> > > >  - pci-imx6.c moved to dwc directory
> > > >  
> > > >  drivers/pci/dwc/Kconfig| 10 ++
> > > >  drivers/pci/dwc/pci-imx6.c | 21 -
> > > >  2 files changed, 22 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
> > > > index b7e15526d676..b6e9ced5a45d 100644
> > > > --- a/drivers/pci/dwc/Kconfig
> > > > +++ b/drivers/pci/dwc/Kconfig
> > > > @@ -77,6 +77,16 @@ config PCI_IMX6
> > > > 
> > > > select PCIEPORTBUS
> > > > select PCIE_DW_HOST
> > > > 
> > > > +config PCI_IMX6_COMPLIANCE_TEST
> > > > +   bool "Enable pcie compliance tests on imx6"
> > > > +   depends on PCI_IMX6
> > > > +   default n
> > > > +   help
> > > > + Enables support for pcie compliance test on FSL iMX SoCs.
> > > > + The link speed wouldn't be limited to gen1 when enabled.
> > > > + Enable only during compliance tests, otherwise
> > > > + link detection will fail on some peripherals.
> > > 
> > > I'm puzzled about why we would want to merge this patch.  It looks
> > > like we're trying to game the system to make the device pass
> > > compliance testing when it isn't really compliant.  Is this config
> > > option useful to users, or is it only useful during internal
> > > development of iMX SoCs?
> > 
> > It's not for passing compliance tests, it is necessary to do the
> > compliance
> > tests. Without this patch only gen 1 speed is possible to test. Also i.mx6
> > is not fully gen2 compliant (withour external clk) we should have the
> > option to do gen2 tests. Switching from gen1 to gen2 is done with a
> > 100MHz (1ms) clk pulse on the receiver. Without this patch link speed is
> > forced to gen1 afterwards.
> 
> I don't understand the purpose of this yet, so maybe all we need is a
> better description.

This patch enabled compliance tests for pcie gen2 published by pci-sig. 
(Signal level, signal timing, jitter and rise/fall times on tx lines). Special 
hardware is needed to do this tests (compliance load board, oszilloscope with 
about 10GHz bandwith). The pcie phy switches in a complinace test state where 
the phy outputs a special test pattern (without having a real link to a 
device). The bitrate and de-emphasis must be switched. The driver (without 
this patches) does not allow to switch to gen2 because it falls back to gen1. 
It is impossible to generate the gen2 test pattern.
The patch now removes the forced gen1 start to allow generating gen2 test 
pattern. gen1 / gen2 switch is done through signals generated on the 
compliance load board (which triggers switching in the phy).
> 
> "It's not for passing compliance testing, it is necessary to do the
> compliance tests" doesn't make sense to me -- it seems
> self-contradictory.
> 
> The Kconfig text says "enable only for testing because it makes link
> detection fail."  To me that means this option is not useful for
> users.  We need some justification for why it should be in the
> mainline kernel, where users and distros may enable it.

It use useless for users and distros. Only board designer want this option. If 
this is not a reason for applying it's ok for me.
> 
> If you can only support gen2 in certain board configurations, maybe
> you should add a config option that can always be enabled for those
> boards.
> 
> > Yes it is only useful for board setup, it is comparable to
> > CONFIG_USB_EHSET_TEST_FIXTURE for usb (ok, this is more general and not
> > host specific).
> 
> USB_EHSET_TEST_FIXTURE (added by 1353aa53851e ("usb: misc: EHSET Test
> Fixture device driver for host compliance")) looks like just another
> driver in the sense that it's always safe to enable it and it doesn't
> hurt anything if you enable it without having the hardware.
> 
> This patch doesn't seem comparable to USB_EHSET_TEST_FIXTURE because
> this new option apparently breaks link detection in some cases.

This depends on the point of view (Black box vs. white box) :-) Both are for 
doing compliance tests on signal level. But I think we don't need to discuss 
this further.
> 
> > > >  config PCIE_SPEAR13XX
> > > >  
> > > > bool "STMicroelectronics SPEAr PCIe controller"
> > > > depends on PCI
> > > > 
> > > > diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
> > > > index 19a289b8cc94..b0fbe52e25b0 100644
> > > > --- a/drivers/pci/dwc/pci-imx6.c
> > > > +++ b/drivers/pci/dwc/pci-imx6.c
> > > > @@ -533,15 +533,18

Re: [PATCH v2 1/1] PCI: imx6: Add pcie compliance test option

2017-06-13 Thread Schöfegger Stefan

On Tuesday, June 13, 2017 8:58:47 AM CEST Bjorn Helgaas wrote:
> On Tue, Jun 13, 2017 at 05:43:14AM +, Schöfegger Stefan wrote:
> > On Monday, June 12, 2017 6:49:24 PM CEST Bjorn Helgaas wrote:
> > > On Wed, Jun 07, 2017 at 01:36:11PM +0200, Stefan Schoefegger wrote:
> > > > Link speed must not be limited to gen1 during link test for compliance
> > > > tests
> > > > 
> > > > Signed-off-by: Stefan Schoefegger 
> > > > ---
> > > > 
> > > > Changes since v1:
> > > >  - pci-imx6.c moved to dwc directory
> > > >  
> > > >  drivers/pci/dwc/Kconfig| 10 ++
> > > >  drivers/pci/dwc/pci-imx6.c | 21 -
> > > >  2 files changed, 22 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
> > > > index b7e15526d676..b6e9ced5a45d 100644
> > > > --- a/drivers/pci/dwc/Kconfig
> > > > +++ b/drivers/pci/dwc/Kconfig
> > > > @@ -77,6 +77,16 @@ config PCI_IMX6
> > > > 
> > > > select PCIEPORTBUS
> > > > select PCIE_DW_HOST
> > > > 
> > > > +config PCI_IMX6_COMPLIANCE_TEST
> > > > +   bool "Enable pcie compliance tests on imx6"
> > > > +   depends on PCI_IMX6
> > > > +   default n
> > > > +   help
> > > > + Enables support for pcie compliance test on FSL iMX SoCs.
> > > > + The link speed wouldn't be limited to gen1 when enabled.
> > > > + Enable only during compliance tests, otherwise
> > > > + link detection will fail on some peripherals.
> > > 
> > > I'm puzzled about why we would want to merge this patch.  It looks
> > > like we're trying to game the system to make the device pass
> > > compliance testing when it isn't really compliant.  Is this config
> > > option useful to users, or is it only useful during internal
> > > development of iMX SoCs?
> > 
> > It's not for passing compliance tests, it is necessary to do the
> > compliance
> > tests. Without this patch only gen 1 speed is possible to test. Also i.mx6
> > is not fully gen2 compliant (withour external clk) we should have the
> > option to do gen2 tests. Switching from gen1 to gen2 is done with a
> > 100MHz (1ms) clk pulse on the receiver. Without this patch link speed is
> > forced to gen1 afterwards.
> 
> I don't understand the purpose of this yet, so maybe all we need is a
> better description.

This patch enabled compliance tests for pcie gen2 published by pci-sig. 
(Signal level, signal timing, jitter and rise/fall times on tx lines). Special 
hardware is needed to do this tests (compliance load board, oszilloscope with 
about 10GHz bandwith). The pcie phy switches in a complinace test state where 
the phy outputs a special test pattern (without having a real link to a 
device). The bitrate and de-emphasis must be switched. The driver (without 
this patches) does not allow to switch to gen2 because it falls back to gen1. 
It is impossible to generate the gen2 test pattern.
The patch now removes the forced gen1 start to allow generating gen2 test 
pattern. gen1 / gen2 switch is done through signals generated on the 
compliance load board (which triggers switching in the phy).
> 
> "It's not for passing compliance testing, it is necessary to do the
> compliance tests" doesn't make sense to me -- it seems
> self-contradictory.
> 
> The Kconfig text says "enable only for testing because it makes link
> detection fail."  To me that means this option is not useful for
> users.  We need some justification for why it should be in the
> mainline kernel, where users and distros may enable it.

It use useless for users and distros. Only board designer want this option. If 
this is not a reason for applying it's ok for me.
> 
> If you can only support gen2 in certain board configurations, maybe
> you should add a config option that can always be enabled for those
> boards.
> 
> > Yes it is only useful for board setup, it is comparable to
> > CONFIG_USB_EHSET_TEST_FIXTURE for usb (ok, this is more general and not
> > host specific).
> 
> USB_EHSET_TEST_FIXTURE (added by 1353aa53851e ("usb: misc: EHSET Test
> Fixture device driver for host compliance")) looks like just another
> driver in the sense that it's always safe to enable it and it doesn't
> hurt anything if you enable it without having the hardware.
> 
> This patch doesn't seem comparable to USB_EHSET_TEST_FIXTURE because
> this new option apparently breaks link detection in some cases.

This depends on the point of view (Black box vs. white box) :-) Both are for 
doing compliance tests on signal level. But I think we don't need to discuss 
this further.
> 
> > > >  config PCIE_SPEAR13XX
> > > >  
> > > > bool "STMicroelectronics SPEAr PCIe controller"
> > > > depends on PCI
> > > > 
> > > > diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
> > > > index 19a289b8cc94..b0fbe52e25b0 100644
> > > > --- a/drivers/pci/dwc/pci-imx6.c
> > > > +++ b/drivers/pci/dwc/pci-imx6.c
> > > > @@ -533,15 +533,18 @@ static int

Re: [PATCH] kernel/kprobes: Add test to validate pt_regs

2017-06-13 Thread Masami Hiramatsu

On Wed, 14 Jun 2017 11:40:08 +0900
Masami Hiramatsu  wrote:

> On Fri,  9 Jun 2017 00:53:08 +0530
> "Naveen N. Rao"  wrote:
> 
> > Add a test to verify that the registers passed in pt_regs on kprobe
> > (trap), optprobe (jump) and kprobe_on_ftrace (ftrace_caller) are
> > accurate. The tests are exercized if KPROBES_SANITY_TEST is enabled.
> 
> Great!
> 
> > 
> > Implemented for powerpc64. Other architectures will have to implement
> > the relevant arch_* helpers and define HAVE_KPROBES_REGS_SANITY_TEST.
> 
> Hmm, why don't you define that in arch/powerpc/Kconfig ?
> Also, could you split this into 3 patches for each case ?
> 
> > 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  arch/powerpc/include/asm/kprobes.h  |   4 +
> >  arch/powerpc/lib/Makefile   |   3 +-
> >  arch/powerpc/lib/test_kprobe_regs.S |  62 
> >  arch/powerpc/lib/test_kprobes.c | 115 ++
> >  include/linux/kprobes.h |  11 +++
> >  kernel/test_kprobes.c   | 183 
> > 
> >  6 files changed, 377 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/powerpc/lib/test_kprobe_regs.S
> >  create mode 100644 arch/powerpc/lib/test_kprobes.c
> > 
> > diff --git a/arch/powerpc/include/asm/kprobes.h 
> > b/arch/powerpc/include/asm/kprobes.h
> > index 566da372e02b..10c91d3132a1 100644
> > --- a/arch/powerpc/include/asm/kprobes.h
> > +++ b/arch/powerpc/include/asm/kprobes.h
> > @@ -124,6 +124,10 @@ static inline int skip_singlestep(struct kprobe *p, 
> > struct pt_regs *regs,
> > return 0;
> >  }
> >  #endif
> > +#if defined(CONFIG_KPROBES_SANITY_TEST) && defined(CONFIG_PPC64)
> > +#define HAVE_KPROBES_REGS_SANITY_TEST
> > +void arch_kprobe_regs_set_ptregs(struct pt_regs *regs);
> > +#endif
> >  #else
> >  static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
> >  static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
> > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> > index 3c3146ba62da..8a0bb8e20179 100644
> > --- a/arch/powerpc/lib/Makefile
> > +++ b/arch/powerpc/lib/Makefile
> > @@ -27,7 +27,8 @@ obj64-y   += copypage_64.o copyuser_64.o mem_64.o 
> > hweight_64.o \
> >  
> >  obj64-$(CONFIG_SMP)+= locks.o
> >  obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
> > -obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
> > +obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o 
> > test_kprobe_regs.o \
> > +  test_kprobes.o
> >  
> >  obj-y  += checksum_$(BITS).o checksum_wrappers.o
> >  
> > diff --git a/arch/powerpc/lib/test_kprobe_regs.S 
> > b/arch/powerpc/lib/test_kprobe_regs.S
> > new file mode 100644
> > index ..4e95eca6dcd3
> > --- /dev/null
> > +++ b/arch/powerpc/lib/test_kprobe_regs.S
> > @@ -0,0 +1,62 @@
> > +/*
> > + * test_kprobe_regs: architectural helpers for validating pt_regs
> > + *  received on a kprobe.
> > + *
> > + * Copyright 2017 Naveen N. Rao 
> > + *   IBM Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; version 2
> > + * of the License.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +_GLOBAL(arch_kprobe_regs_function)
> > +   mflrr0
> > +   std r0, LRSAVE(r1)
> > +   stdur1, -SWITCH_FRAME_SIZE(r1)
> > +
> > +   /* Tell pre handler about our pt_regs location */
> > +   addir3, r1, STACK_FRAME_OVERHEAD
> > +   bl  arch_kprobe_regs_set_ptregs
> > +
> > +   /* Load back our true LR */
> > +   ld  r0, (SWITCH_FRAME_SIZE + LRSAVE)(r1)
> > +   mtlrr0
> > +
> > +   /* Save all SPRs that we care about */
> > +   mfctr   r0
> > +   std r0, _CTR(r1)
> > +   mflrr0
> > +   std r0, _LINK(r1)
> > +   mfspr   r0, SPRN_XER
> > +   std r0, _XER(r1)
> > +   mfcrr0
> > +   std r0, _CCR(r1)
> > +
> > +   /* Now, save all GPRs */
> > +   SAVE_2GPRS(0, r1)
> > +   SAVE_10GPRS(2, r1)
> > +   SAVE_10GPRS(12, r1)
> > +   SAVE_10GPRS(22, r1)
> > +
> > +   /* We're now ready to be probed */
> > +.global arch_kprobe_regs_probepoint
> > +arch_kprobe_regs_probepoint:
> > +   nop
> > +
> > +#ifdef CONFIG_KPROBES_ON_FTRACE
> > +   /* Let's also test KPROBES_ON_FTRACE */
> > +   bl  kprobe_regs_kp_on_ftrace_target
> > +   nop
> > +#endif
> > +
> > +   /* All done */
> > +   addir1, r1, SWITCH_FRAME_SIZE
> > +   ld  r0, LRSAVE(r1)
> > +   mtlrr0
> > +   blr
> > diff --git a/arch/powerpc/lib/test_kprobes.c 
> > b/arch/powerpc/lib/test_kprobes.c
> > new file mode 100644
> > index ..23f7a7ffcdd6
> > --- /dev/null
> > +++ b/arch/powerpc/lib/test_kprobes.c
> > @@ -0,0 +1,115 @@
> > +/*
> > + * test_kprobes: architectural

Re: [PATCH] kernel/kprobes: Add test to validate pt_regs

2017-06-13 Thread Masami Hiramatsu

On Wed, 14 Jun 2017 11:40:08 +0900
Masami Hiramatsu  wrote:

> On Fri,  9 Jun 2017 00:53:08 +0530
> "Naveen N. Rao"  wrote:
> 
> > Add a test to verify that the registers passed in pt_regs on kprobe
> > (trap), optprobe (jump) and kprobe_on_ftrace (ftrace_caller) are
> > accurate. The tests are exercized if KPROBES_SANITY_TEST is enabled.
> 
> Great!
> 
> > 
> > Implemented for powerpc64. Other architectures will have to implement
> > the relevant arch_* helpers and define HAVE_KPROBES_REGS_SANITY_TEST.
> 
> Hmm, why don't you define that in arch/powerpc/Kconfig ?
> Also, could you split this into 3 patches for each case ?
> 
> > 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  arch/powerpc/include/asm/kprobes.h  |   4 +
> >  arch/powerpc/lib/Makefile   |   3 +-
> >  arch/powerpc/lib/test_kprobe_regs.S |  62 
> >  arch/powerpc/lib/test_kprobes.c | 115 ++
> >  include/linux/kprobes.h |  11 +++
> >  kernel/test_kprobes.c   | 183 
> > 
> >  6 files changed, 377 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/powerpc/lib/test_kprobe_regs.S
> >  create mode 100644 arch/powerpc/lib/test_kprobes.c
> > 
> > diff --git a/arch/powerpc/include/asm/kprobes.h 
> > b/arch/powerpc/include/asm/kprobes.h
> > index 566da372e02b..10c91d3132a1 100644
> > --- a/arch/powerpc/include/asm/kprobes.h
> > +++ b/arch/powerpc/include/asm/kprobes.h
> > @@ -124,6 +124,10 @@ static inline int skip_singlestep(struct kprobe *p, 
> > struct pt_regs *regs,
> > return 0;
> >  }
> >  #endif
> > +#if defined(CONFIG_KPROBES_SANITY_TEST) && defined(CONFIG_PPC64)
> > +#define HAVE_KPROBES_REGS_SANITY_TEST
> > +void arch_kprobe_regs_set_ptregs(struct pt_regs *regs);
> > +#endif
> >  #else
> >  static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
> >  static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
> > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> > index 3c3146ba62da..8a0bb8e20179 100644
> > --- a/arch/powerpc/lib/Makefile
> > +++ b/arch/powerpc/lib/Makefile
> > @@ -27,7 +27,8 @@ obj64-y   += copypage_64.o copyuser_64.o mem_64.o 
> > hweight_64.o \
> >  
> >  obj64-$(CONFIG_SMP)+= locks.o
> >  obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o
> > -obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o
> > +obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o 
> > test_kprobe_regs.o \
> > +  test_kprobes.o
> >  
> >  obj-y  += checksum_$(BITS).o checksum_wrappers.o
> >  
> > diff --git a/arch/powerpc/lib/test_kprobe_regs.S 
> > b/arch/powerpc/lib/test_kprobe_regs.S
> > new file mode 100644
> > index ..4e95eca6dcd3
> > --- /dev/null
> > +++ b/arch/powerpc/lib/test_kprobe_regs.S
> > @@ -0,0 +1,62 @@
> > +/*
> > + * test_kprobe_regs: architectural helpers for validating pt_regs
> > + *  received on a kprobe.
> > + *
> > + * Copyright 2017 Naveen N. Rao 
> > + *   IBM Corporation
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; version 2
> > + * of the License.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +_GLOBAL(arch_kprobe_regs_function)
> > +   mflrr0
> > +   std r0, LRSAVE(r1)
> > +   stdur1, -SWITCH_FRAME_SIZE(r1)
> > +
> > +   /* Tell pre handler about our pt_regs location */
> > +   addir3, r1, STACK_FRAME_OVERHEAD
> > +   bl  arch_kprobe_regs_set_ptregs
> > +
> > +   /* Load back our true LR */
> > +   ld  r0, (SWITCH_FRAME_SIZE + LRSAVE)(r1)
> > +   mtlrr0
> > +
> > +   /* Save all SPRs that we care about */
> > +   mfctr   r0
> > +   std r0, _CTR(r1)
> > +   mflrr0
> > +   std r0, _LINK(r1)
> > +   mfspr   r0, SPRN_XER
> > +   std r0, _XER(r1)
> > +   mfcrr0
> > +   std r0, _CCR(r1)
> > +
> > +   /* Now, save all GPRs */
> > +   SAVE_2GPRS(0, r1)
> > +   SAVE_10GPRS(2, r1)
> > +   SAVE_10GPRS(12, r1)
> > +   SAVE_10GPRS(22, r1)
> > +
> > +   /* We're now ready to be probed */
> > +.global arch_kprobe_regs_probepoint
> > +arch_kprobe_regs_probepoint:
> > +   nop
> > +
> > +#ifdef CONFIG_KPROBES_ON_FTRACE
> > +   /* Let's also test KPROBES_ON_FTRACE */
> > +   bl  kprobe_regs_kp_on_ftrace_target
> > +   nop
> > +#endif
> > +
> > +   /* All done */
> > +   addir1, r1, SWITCH_FRAME_SIZE
> > +   ld  r0, LRSAVE(r1)
> > +   mtlrr0
> > +   blr
> > diff --git a/arch/powerpc/lib/test_kprobes.c 
> > b/arch/powerpc/lib/test_kprobes.c
> > new file mode 100644
> > index ..23f7a7ffcdd6
> > --- /dev/null
> > +++ b/arch/powerpc/lib/test_kprobes.c
> > @@ -0,0 +1,115 @@
> > +/*
> > + * test_kprobes: architectural helpers for validating pt_regs
> > + *  received on a kprobe.
> > + *
> > + * Copyright 2017 Naveen N. Rao 
> > + *

Re: [PATCHv2] ARM: OMAP: PM: stop early on systems without twl

2017-06-13 Thread Tony Lindgren

* Sebastian Reichel  [170613 02:50]:
> From: Sebastian Reichel 
> diff --git a/arch/arm/mach-omap2/pm.c b/arch/arm/mach-omap2/pm.c
> index 63027e60cc20..92e335decc61 100644
> --- a/arch/arm/mach-omap2/pm.c
> +++ b/arch/arm/mach-omap2/pm.c
> @@ -240,6 +240,10 @@ omap_postcore_initcall(omap2_common_pm_init);
>  
>  int __init omap2_common_pm_late_init(void)
>  {
> +#if IS_BUILTIN(CONFIG_TWL6040_CORE) || IS_BUILTIN(CONFIG_TWL4030_CORE)
> + if (!twl_rev())
> + goto no_twl;
> +
>   /* Init the voltage layer */
>   omap3_twl_init();
>   omap4_twl_init();
> @@ -253,4 +257,9 @@ int __init omap2_common_pm_late_init(void)
>   omap_devinit_smartreflex();
>  
>   return 0;
> +
> +no_twl:
> +#endif
> + pr_err("OMAP4 PM not supported!\n");
> + return -ENODEV;

This should probably just say "OMAP4 SmartReflex not supported".
We already have omap4 CPUs hit off mode during idle. What's
missing is save and restore of registers for various domains.

SmartReflex is also doable without twl PMICs, at least ti81xx
are doing SmartReflex over a group of GPIO pins without twl.

So maybe just put the twl specific parts into IS_BUILTIN so
we can optionally add other PM init here if needed?

Regards,

Tony

Re: [PATCHv2] ARM: OMAP: PM: stop early on systems without twl

2017-06-13 Thread Tony Lindgren

* Sebastian Reichel  [170613 02:50]:
> From: Sebastian Reichel 
> diff --git a/arch/arm/mach-omap2/pm.c b/arch/arm/mach-omap2/pm.c
> index 63027e60cc20..92e335decc61 100644
> --- a/arch/arm/mach-omap2/pm.c
> +++ b/arch/arm/mach-omap2/pm.c
> @@ -240,6 +240,10 @@ omap_postcore_initcall(omap2_common_pm_init);
>  
>  int __init omap2_common_pm_late_init(void)
>  {
> +#if IS_BUILTIN(CONFIG_TWL6040_CORE) || IS_BUILTIN(CONFIG_TWL4030_CORE)
> + if (!twl_rev())
> + goto no_twl;
> +
>   /* Init the voltage layer */
>   omap3_twl_init();
>   omap4_twl_init();
> @@ -253,4 +257,9 @@ int __init omap2_common_pm_late_init(void)
>   omap_devinit_smartreflex();
>  
>   return 0;
> +
> +no_twl:
> +#endif
> + pr_err("OMAP4 PM not supported!\n");
> + return -ENODEV;

This should probably just say "OMAP4 SmartReflex not supported".
We already have omap4 CPUs hit off mode during idle. What's
missing is save and restore of registers for various domains.

SmartReflex is also doable without twl PMICs, at least ti81xx
are doing SmartReflex over a group of GPIO pins without twl.

So maybe just put the twl specific parts into IS_BUILTIN so
we can optionally add other PM init here if needed?

Regards,

Tony

[RESEND PATCH] base/memory: pass the base_section in add_memory_block

2017-06-13 Thread Wei Yang

Based on Greg's comment, cc it to mm list.
The original thread could be found https://lkml.org/lkml/2017/6/7/202

The second parameter of init_memory_block() is used to calculate the
start_section_nr of this block, which means any section in the same block
would get the same start_section_nr.

This patch passes the base_section to init_memory_block(), so that to
reduce a local variable and a check in every loop.

Signed-off-by: Wei Yang 
---
 drivers/base/memory.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index cc4f1d0cbffe..1e903aba2aa1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -664,21 +664,20 @@ static int init_memory_block(struct memory_block **memory,
 static int add_memory_block(int base_section_nr)
 {
struct memory_block *mem;
-   int i, ret, section_count = 0, section_nr;
+   int i, ret, section_count = 0;
 
for (i = base_section_nr;
 (i < base_section_nr + sections_per_block) && i < NR_MEM_SECTIONS;
 i++) {
if (!present_section_nr(i))
continue;
-   if (section_count == 0)
-   section_nr = i;
section_count++;
}
 
if (section_count == 0)
return 0;
-   ret = init_memory_block(, __nr_to_section(section_nr), MEM_ONLINE);
+   ret = init_memory_block(, __nr_to_section(base_section_nr),
+   MEM_ONLINE);
if (ret)
return ret;
mem->section_count = section_count;
-- 
2.11.0

[RESEND PATCH] base/memory: pass the base_section in add_memory_block

2017-06-13 Thread Wei Yang

Based on Greg's comment, cc it to mm list.
The original thread could be found https://lkml.org/lkml/2017/6/7/202

The second parameter of init_memory_block() is used to calculate the
start_section_nr of this block, which means any section in the same block
would get the same start_section_nr.

This patch passes the base_section to init_memory_block(), so that to
reduce a local variable and a check in every loop.

Signed-off-by: Wei Yang 
---
 drivers/base/memory.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index cc4f1d0cbffe..1e903aba2aa1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -664,21 +664,20 @@ static int init_memory_block(struct memory_block **memory,
 static int add_memory_block(int base_section_nr)
 {
struct memory_block *mem;
-   int i, ret, section_count = 0, section_nr;
+   int i, ret, section_count = 0;
 
for (i = base_section_nr;
 (i < base_section_nr + sections_per_block) && i < NR_MEM_SECTIONS;
 i++) {
if (!present_section_nr(i))
continue;
-   if (section_count == 0)
-   section_nr = i;
section_count++;
}
 
if (section_count == 0)
return 0;
-   ret = init_memory_block(, __nr_to_section(section_nr), MEM_ONLINE);
+   ret = init_memory_block(, __nr_to_section(base_section_nr),
+   MEM_ONLINE);
if (ret)
return ret;
mem->section_count = section_count;
-- 
2.11.0

Re: [PATCH 1/2] perf evsel: Fix probing of precise_ip level for default cycles event

2017-06-13 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

>  tools/perf/tests/task-exit.c | 2 +-
>  tools/perf/util/evsel.c  | 5 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
> index 32873ec91a4e..cf00ebad2ef5 100644
> --- a/tools/perf/tests/task-exit.c
> +++ b/tools/perf/tests/task-exit.c
> @@ -83,7 +83,7 @@ int test__task_exit(int subtest __maybe_unused)
>  
>   evsel = perf_evlist__first(evlist);
>   evsel->attr.task = 1;
> - evsel->attr.sample_freq = 0;
> + evsel->attr.sample_freq = 1;
>   evsel->attr.inherit = 0;
>   evsel->attr.watermark = 0;
>   evsel->attr.wakeup_events = 1;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index e4f7902d5afa..a7ce529ca36c 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -273,6 +273,11 @@ struct perf_evsel *perf_evsel__new_cycles(void)
>   struct perf_evsel *evsel;
>  
>   event_attr_init();
> + /*
> +  * Unnamed union member, not supported as struct member named
> +  * initializer in older compilers such as gcc 4.4.7
> +  */
> + attr.sample_period = 1;
>  
>   perf_event_attr__set_max_precise_ip();

Hm, so this really broke perf for me on my main system - 'perf top' and 'perf 
report' only shows:

 triton:~/tip> perf report --stdio
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 # To display the perf.data header info, please use --header/--header-only 
options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 921K of event 'cycles:ppp'
 # Event count (approx.): 921045
 #
 # Overhead  CommandShared Object Symbol  
 #   .    
 #
99.93%  hackbench  [kernel.vmlinux]  [k] native_write_msr
 0.07%  perf   [kernel.vmlinux]  [k] native_write_msr

the bisection result is unambiguous:

   7fd1d092b4337831d7ccbf3a74c07cb0b2089023 is the first bad commit

proper output would be:

 ...
 #
 # Total Lost Samples: 0
 #
 # Samples: 9K of event 'cycles'
 # Event count (approx.): 4378583062
 #
 # Overhead  CommandShared Object Symbol
 
 #   .    
...
 #
 4.32%  hackbench  [kernel.vmlinux]  [k] copy_user_enhanced_fast_string
 4.02%  hackbench  [kernel.vmlinux]  [k] unix_stream_read_generic
 3.75%  hackbench  [kernel.vmlinux]  [k] filemap_map_pages
 3.06%  hackbench  [kernel.vmlinux]  [k] __check_object_size
 2.44%  hackbench  [kernel.vmlinux]  [k] _raw_spin_lock_irqsave
 2.32%  hackbench  [kernel.vmlinux]  [k] native_queued_spin_lock_slowpath
 2.22%  hackbench  [kernel.vmlinux]  [k] entry_SYSENTER_compat
 1.90%  hackbench  [vdso][.] __vdso_gettimeofday
 1.80%  hackbench  [kernel.vmlinux]  [k] _raw_spin_lock
 1.80%  hackbench  [kernel.vmlinux]  [k] skb_set_owner_w
 1.67%  hackbench  [kernel.vmlinux]  [k] kmem_cache_free
 1.52%  hackbench  [kernel.vmlinux]  [k] skb_release_data
 1.48%  hackbench  [kernel.vmlinux]  [k] common_file_perm
 1.45%  hackbench  [kernel.vmlinux]  [k] page_fault
 1.45%  hackbench  [kernel.vmlinux]  [k] cmpxchg_double_slab.isra.62
 1.42%  hackbench  [kernel.vmlinux]  [k] new_sync_read
 1.36%  hackbench  [kernel.vmlinux]  [k] __check_heap_object

Here's the hardware details:

# 
# captured on: Wed Jun 14 07:34:42 2017
# hostname : triton
# os release : 4.10.0-23-generic
# perf version : 4.12.rc5.g9688eb
# arch : x86_64
# nrcpus online : 12
# nrcpus avail : 12
# cpudesc : Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz
# cpuid : GenuineIntel,6,62,4
# total memory : 65917012 kB
# cmdline : /home/mingo/bin/perf record /home/mingo/hackbench 10 
# event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 1, 
sample_type = IP|TID|TIME, disabled = 1, inherit = 1, mmap = 1, comm = 1, 
enable_on_exec = 1, task = 1,
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_bts = 6, uncore_cbox_5 = 21, uncore_ha_0 = 8, 
uncore_imc_2 = 9, uncore_cbox_3 = 19, cstate_pkg = 25, breakpoint = 5, 
uncore_imc_0 = 11, uncore_ubox = 15, uncore
# HEADER_CACHE info available, use -I to display
# missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC 
HEADER_AUXTRACE HEADER_STAT 
# 

let me know if you need more info.

Btw., note that there's also this warning:

  unwind: target platform=x86 is not supported

(but that's unrelated to this commit.)

Thanks,

Ingo

Re: [PATCH 1/2] perf evsel: Fix probing of precise_ip level for default cycles event

2017-06-13 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

>  tools/perf/tests/task-exit.c | 2 +-
>  tools/perf/util/evsel.c  | 5 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
> index 32873ec91a4e..cf00ebad2ef5 100644
> --- a/tools/perf/tests/task-exit.c
> +++ b/tools/perf/tests/task-exit.c
> @@ -83,7 +83,7 @@ int test__task_exit(int subtest __maybe_unused)
>  
>   evsel = perf_evlist__first(evlist);
>   evsel->attr.task = 1;
> - evsel->attr.sample_freq = 0;
> + evsel->attr.sample_freq = 1;
>   evsel->attr.inherit = 0;
>   evsel->attr.watermark = 0;
>   evsel->attr.wakeup_events = 1;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index e4f7902d5afa..a7ce529ca36c 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -273,6 +273,11 @@ struct perf_evsel *perf_evsel__new_cycles(void)
>   struct perf_evsel *evsel;
>  
>   event_attr_init();
> + /*
> +  * Unnamed union member, not supported as struct member named
> +  * initializer in older compilers such as gcc 4.4.7
> +  */
> + attr.sample_period = 1;
>  
>   perf_event_attr__set_max_precise_ip();

Hm, so this really broke perf for me on my main system - 'perf top' and 'perf 
report' only shows:

 triton:~/tip> perf report --stdio
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 unwind: target platform=x86 is not supported
 # To display the perf.data header info, please use --header/--header-only 
options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 921K of event 'cycles:ppp'
 # Event count (approx.): 921045
 #
 # Overhead  CommandShared Object Symbol  
 #   .    
 #
99.93%  hackbench  [kernel.vmlinux]  [k] native_write_msr
 0.07%  perf   [kernel.vmlinux]  [k] native_write_msr

the bisection result is unambiguous:

   7fd1d092b4337831d7ccbf3a74c07cb0b2089023 is the first bad commit

proper output would be:

 ...
 #
 # Total Lost Samples: 0
 #
 # Samples: 9K of event 'cycles'
 # Event count (approx.): 4378583062
 #
 # Overhead  CommandShared Object Symbol
 
 #   .    
...
 #
 4.32%  hackbench  [kernel.vmlinux]  [k] copy_user_enhanced_fast_string
 4.02%  hackbench  [kernel.vmlinux]  [k] unix_stream_read_generic
 3.75%  hackbench  [kernel.vmlinux]  [k] filemap_map_pages
 3.06%  hackbench  [kernel.vmlinux]  [k] __check_object_size
 2.44%  hackbench  [kernel.vmlinux]  [k] _raw_spin_lock_irqsave
 2.32%  hackbench  [kernel.vmlinux]  [k] native_queued_spin_lock_slowpath
 2.22%  hackbench  [kernel.vmlinux]  [k] entry_SYSENTER_compat
 1.90%  hackbench  [vdso][.] __vdso_gettimeofday
 1.80%  hackbench  [kernel.vmlinux]  [k] _raw_spin_lock
 1.80%  hackbench  [kernel.vmlinux]  [k] skb_set_owner_w
 1.67%  hackbench  [kernel.vmlinux]  [k] kmem_cache_free
 1.52%  hackbench  [kernel.vmlinux]  [k] skb_release_data
 1.48%  hackbench  [kernel.vmlinux]  [k] common_file_perm
 1.45%  hackbench  [kernel.vmlinux]  [k] page_fault
 1.45%  hackbench  [kernel.vmlinux]  [k] cmpxchg_double_slab.isra.62
 1.42%  hackbench  [kernel.vmlinux]  [k] new_sync_read
 1.36%  hackbench  [kernel.vmlinux]  [k] __check_heap_object

Here's the hardware details:

# 
# captured on: Wed Jun 14 07:34:42 2017
# hostname : triton
# os release : 4.10.0-23-generic
# perf version : 4.12.rc5.g9688eb
# arch : x86_64
# nrcpus online : 12
# nrcpus avail : 12
# cpudesc : Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz
# cpuid : GenuineIntel,6,62,4
# total memory : 65917012 kB
# cmdline : /home/mingo/bin/perf record /home/mingo/hackbench 10 
# event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } = 1, 
sample_type = IP|TID|TIME, disabled = 1, inherit = 1, mmap = 1, comm = 1, 
enable_on_exec = 1, task = 1,
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_bts = 6, uncore_cbox_5 = 21, uncore_ha_0 = 8, 
uncore_imc_2 = 9, uncore_cbox_3 = 19, cstate_pkg = 25, breakpoint = 5, 
uncore_imc_0 = 11, uncore_ubox = 15, uncore
# HEADER_CACHE info available, use -I to display
# missing features: HEADER_TRACING_DATA HEADER_BRANCH_STACK HEADER_GROUP_DESC 
HEADER_AUXTRACE HEADER_STAT 
# 

let me know if you need more info.

Btw., note that there's also this warning:

  unwind: target platform=x86 is not supported

(but that's unrelated to this commit.)

Thanks,

Ingo

Re: [RFC/RFT PATCH 4/4] [debug] ARM: am335x: illustrate hwstamp

2017-06-13 Thread Tony Lindgren

* Grygorii Strashko  [170613 16:20]:
> This patch allows to test CPTS HW_TS_PUSH functionality on am335x boards
> 
> below sequence of commands will enable Timer7 to trigger 1sec
> periodic pulses on CPTS HW4_TS_PUSH input pin:
> 
>  # echo 10 > /sys/class/pwm/pwmchip0/pwm0/period
>  # echo 5 > /sys/class/pwm/pwmchip0/pwm0/duty_cycle
>  # echo 1 > /sys/class/pwm/pwmchip0/pwm0/enable
>  # ./ptp/testptp -e 10 -i 3
> external time stamp request okay
> event index 3 at 1493259028.376600798
> event index 3 at 1493259029.377170898
> event index 3 at 1493259030.377741039
> event index 3 at 1493259031.378311139
> event index 3 at 1493259032.378881279
> event index 3 at 1493259033.379451424
> event index 3 at 1493259034.380021520
> event index 3 at 1493259035.380591660
> event index 3 at 1493259036.381161765
> event index 3 at 1493259037.381731909

Cool :)

Acked-by: Tony Lindgren

Re: [RFC/RFT PATCH 4/4] [debug] ARM: am335x: illustrate hwstamp

2017-06-13 Thread Tony Lindgren

* Grygorii Strashko  [170613 16:20]:
> This patch allows to test CPTS HW_TS_PUSH functionality on am335x boards
> 
> below sequence of commands will enable Timer7 to trigger 1sec
> periodic pulses on CPTS HW4_TS_PUSH input pin:
> 
>  # echo 10 > /sys/class/pwm/pwmchip0/pwm0/period
>  # echo 5 > /sys/class/pwm/pwmchip0/pwm0/duty_cycle
>  # echo 1 > /sys/class/pwm/pwmchip0/pwm0/enable
>  # ./ptp/testptp -e 10 -i 3
> external time stamp request okay
> event index 3 at 1493259028.376600798
> event index 3 at 1493259029.377170898
> event index 3 at 1493259030.377741039
> event index 3 at 1493259031.378311139
> event index 3 at 1493259032.378881279
> event index 3 at 1493259033.379451424
> event index 3 at 1493259034.380021520
> event index 3 at 1493259035.380591660
> event index 3 at 1493259036.381161765
> event index 3 at 1493259037.381731909

Cool :)

Acked-by: Tony Lindgren

Re: [GIT PULL 0/2] perf/urgent fixes for 4.12

2017-06-13 Thread Ingo Molnar


* Arnaldo Carvalho de Melo <a...@kernel.org> wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 47c1ded7fef108c730b803cd386241beffcdd15c:
> 
>   Merge tag 'perf-urgent-for-mingo-4.12-20170608' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent 
> (2017-06-09 00:41:33 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo-4.12-20170613
> 
> for you to fetch changes up to 9e0c6fd15fcaea39784d1fb3e9fc573f1cf0ae60:
> 
>   perf tools: Fix build with ARCH=x86_64 (2017-06-13 16:20:37 -0300)
> 
> 
> perf/urgent fixes:
> 
> - Fix probing of precise_ip level for default cycles event, that
>   got broken recently on x86_64 when its arch code started
>   considering invalid requesting precise samples when not sampling
>   (i.e. when attr.sample_period == 0).
> 
>   This also fixes another problem in s/390 where the precision
>   probing with sample_period == 0 returned precise_ip > 0, that
>   then, when setting up the real cycles event (not probing) would
>   return EOPNOTSUPP for precise_ip > 0 (as determined previously
>   by probing) and sample_period > 0.
> 
>   These problems resulted in attr_precise not being set to the
>   highest precision available on x86.64 when no event was specified,
>   i.e. the canonical:
> 
>   perf record ./workload
> 
>   would end up using attr.precise_ip = 0. As a workaround this would
>   need to be done:
> 
>   perf record -e cycles:P ./workload
> 
>   And on s/390 it would plain not work, requiring using:
> 
> perf record -e cycles ./workload
> 
>   as a workaround.  (Arnaldo Carvalho de Melo)
> 
> - Fix perf build with ARCH=x86_64, when ARCH should be transformed
>   into ARCH=x86, just like with the main kernel Makefile and
>   tools/objtool's, i.e. use SRCARCH. (Jiada Wang)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
> 
> 
> Arnaldo Carvalho de Melo (1):
>   perf evsel: Fix probing of precise_ip level for default cycles event
> 
> Jiada Wang (1):
>   perf tools: Fix build with ARCH=x86_64
> 
>  tools/perf/Makefile.config   | 38 +++---
>  tools/perf/Makefile.perf |  2 +-
>  tools/perf/arch/Build|  2 +-
>  tools/perf/pmu-events/Build  |  4 ++--
>  tools/perf/tests/Build   |  2 +-
>  tools/perf/tests/task-exit.c |  2 +-
>  tools/perf/util/evsel.c  |  5 +
>  tools/perf/util/header.c |  2 +-
>  8 files changed, 31 insertions(+), 26 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo

Re: [GIT PULL 0/2] perf/urgent fixes for 4.12

2017-06-13 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 47c1ded7fef108c730b803cd386241beffcdd15c:
> 
>   Merge tag 'perf-urgent-for-mingo-4.12-20170608' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent 
> (2017-06-09 00:41:33 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo-4.12-20170613
> 
> for you to fetch changes up to 9e0c6fd15fcaea39784d1fb3e9fc573f1cf0ae60:
> 
>   perf tools: Fix build with ARCH=x86_64 (2017-06-13 16:20:37 -0300)
> 
> 
> perf/urgent fixes:
> 
> - Fix probing of precise_ip level for default cycles event, that
>   got broken recently on x86_64 when its arch code started
>   considering invalid requesting precise samples when not sampling
>   (i.e. when attr.sample_period == 0).
> 
>   This also fixes another problem in s/390 where the precision
>   probing with sample_period == 0 returned precise_ip > 0, that
>   then, when setting up the real cycles event (not probing) would
>   return EOPNOTSUPP for precise_ip > 0 (as determined previously
>   by probing) and sample_period > 0.
> 
>   These problems resulted in attr_precise not being set to the
>   highest precision available on x86.64 when no event was specified,
>   i.e. the canonical:
> 
>   perf record ./workload
> 
>   would end up using attr.precise_ip = 0. As a workaround this would
>   need to be done:
> 
>   perf record -e cycles:P ./workload
> 
>   And on s/390 it would plain not work, requiring using:
> 
> perf record -e cycles ./workload
> 
>   as a workaround.  (Arnaldo Carvalho de Melo)
> 
> - Fix perf build with ARCH=x86_64, when ARCH should be transformed
>   into ARCH=x86, just like with the main kernel Makefile and
>   tools/objtool's, i.e. use SRCARCH. (Jiada Wang)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (1):
>   perf evsel: Fix probing of precise_ip level for default cycles event
> 
> Jiada Wang (1):
>   perf tools: Fix build with ARCH=x86_64
> 
>  tools/perf/Makefile.config   | 38 +++---
>  tools/perf/Makefile.perf |  2 +-
>  tools/perf/arch/Build|  2 +-
>  tools/perf/pmu-events/Build  |  4 ++--
>  tools/perf/tests/Build   |  2 +-
>  tools/perf/tests/task-exit.c |  2 +-
>  tools/perf/util/evsel.c  |  5 +
>  tools/perf/util/header.c |  2 +-
>  8 files changed, 31 insertions(+), 26 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo

Re: [PATCH v2 09/10] x86/mm: Enable CR4.PCIDE on supported systems

2017-06-13 Thread Juergen Gross

On 14/06/17 06:56, Andy Lutomirski wrote:
> We can use PCID if the CPU has PCID and PGE and we're not on Xen.
> 
> By itself, this has no effect.  The next patch will start using
> PCID.
> 
> Cc: Juergen Gross 
> Cc: Boris Ostrovsky 
> Signed-off-by: Andy Lutomirski 

Reviewed-by: Juergen Gross 


Thanks,

Juergen

Re: [PATCH v2 09/10] x86/mm: Enable CR4.PCIDE on supported systems

2017-06-13 Thread Juergen Gross

On 14/06/17 06:56, Andy Lutomirski wrote:
> We can use PCID if the CPU has PCID and PGE and we're not on Xen.
> 
> By itself, this has no effect.  The next patch will start using
> PCID.
> 
> Cc: Juergen Gross 
> Cc: Boris Ostrovsky 
> Signed-off-by: Andy Lutomirski 

Reviewed-by: Juergen Gross 


Thanks,

Juergen

RE: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-06-13 Thread Chen, Xiaoguang



>-Original Message-
>From: Alex Williamson [mailto:alex.william...@redhat.com]
>Sent: Wednesday, June 14, 2017 11:46 AM
>To: Chen, Xiaoguang 
>Cc: Tian, Kevin ; intel-...@lists.freedesktop.org; linux-
>ker...@vger.kernel.org; zhen...@linux.intel.com; ch...@chris-wilson.co.uk; Lv,
>Zhiyuan ; intel-gvt-...@lists.freedesktop.org; Wang, Zhi
>A ; kra...@redhat.com
>Subject: Re: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf operations
>
>On Wed, 14 Jun 2017 03:18:31 +
>"Chen, Xiaoguang"  wrote:
>
>> >-Original Message-
>> >From: intel-gvt-dev
>> >[mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On Behalf Of
>> >Alex Williamson
>> >Sent: Wednesday, June 14, 2017 11:06 AM
>> >To: Chen, Xiaoguang 
>> >Cc: Tian, Kevin ;
>> >intel-...@lists.freedesktop.org; linux- ker...@vger.kernel.org;
>> >zhen...@linux.intel.com; ch...@chris-wilson.co.uk; Lv, Zhiyuan
>> >; intel-gvt-...@lists.freedesktop.org; Wang,
>> >Zhi A ; kra...@redhat.com
>> >Subject: Re: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf
>> >operations
>> >
>> >On Wed, 14 Jun 2017 02:53:24 +
>> >"Chen, Xiaoguang"  wrote:
>> >
>> >> >-Original Message-
>> >> >From: Alex Williamson [mailto:alex.william...@redhat.com]
>> >> >Sent: Wednesday, June 14, 2017 5:25 AM
>> >> >To: Chen, Xiaoguang 
>> >> >Cc: kra...@redhat.com; ch...@chris-wilson.co.uk; intel-
>> >> >g...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>> >> >zhen...@linux.intel.com; Lv, Zhiyuan ;
>> >> >intel-gvt- d...@lists.freedesktop.org; Wang, Zhi A
>> >> >; Tian, Kevin 
>> >> >Subject: Re: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf
>> >> >operations
>> >> >
>> >> >On Fri,  9 Jun 2017 14:50:40 +0800 Xiaoguang Chen
>> >> > wrote:
>> >> >
>> >> >> Here we defined a new ioctl to create a fd for a vfio device
>> >> >> based on the input type. Now only one type is supported that is
>> >> >> a dma-buf management fd.
>> >> >> Two ioctls are defined for the dma-buf management fd: query the
>> >> >> vfio vgpu's plane information and create a dma-buf for a plane.
>> >> >>
>> >> >> Signed-off-by: Xiaoguang Chen 
>> >> >> ---
>> >> >>  include/uapi/linux/vfio.h | 58
>> >> >> +++
>> >> >>  1 file changed, 58 insertions(+)
>> >> >>
>> >> >> diff --git a/include/uapi/linux/vfio.h
>> >> >> b/include/uapi/linux/vfio.h index ae46105..24427b7 100644
>> >> >> --- a/include/uapi/linux/vfio.h
>> >> >> +++ b/include/uapi/linux/vfio.h
>> >> >> @@ -502,6 +502,64 @@ struct vfio_pci_hot_reset {
>> >> >>
>> >> >>  #define VFIO_DEVICE_PCI_HOT_RESET _IO(VFIO_TYPE, VFIO_BASE + 13)
>> >> >>
>> >> >> +/**
>> >> >> + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32)
>> >> >> + *
>> >> >> + * Create a fd for a vfio device based on the input type
>> >> >> + * Vendor driver should handle this ioctl to create a fd and
>> >> >> +manage the
>> >> >> + * life cycle of this fd.
>> >> >> + *
>> >> >> + * Return: a fd if vendor support that type, -errno if not
>> >> >> +supported */
>> >> >> +
>> >> >> +#define VFIO_DEVICE_GET_FD_IO(VFIO_TYPE, VFIO_BASE + 14)
>> >> >> +
>> >> >> +struct vfio_vgpu_plane_info {
>> >> >> +  __u64 start;
>> >> >> +  __u64 drm_format_mod;
>> >> >> +  __u32 drm_format;
>> >> >> +  __u32 width;
>> >> >> +  __u32 height;
>> >> >> +  __u32 stride;
>> >> >> +  __u32 size;
>> >> >> +  __u32 x_pos;
>> >> >> +  __u32 y_pos;
>> >> >> +  __u32 padding;
>> >> >> +};
>> >> >> +
>> >> >> +#define VFIO_DEVICE_DMABUF_MGR_FD 0 /* Supported fd types
>*/
>> >> >
>> >> >Move this #define up above vfio_vgpu_plane_info to associate it
>> >> >with the VFIO_DEVICE_GET_FD ioctl.
>> >> OK.
>> >>
>> >> >
>> >> >> +
>> >> >> +/*
>> >> >> + * VFIO_DEVICE_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15,
>> >> >> + *struct
>vfio_vgpu_query_plane)
>> >> >> + * Query plane information
>> >> >> + */
>> >> >> +struct vfio_vgpu_query_plane {
>> >> >> +  __u32 argsz;
>> >> >> +  __u32 flags;
>> >> >> +  struct vfio_vgpu_plane_info plane_info;
>> >> >> +  __u32 plane_id;
>> >> >> +  __u32 padding;
>> >> >
>> >> >This padding doesn't make sense.
>> >> This padding is still needed if we do not move the plane_id into
>> >vfio_vgpu_plane_info. Right?
>> >
>> >I don't see why this padding is ever needed, can you explain?
>> I thought we add the padding to make sure the structure layout is the same in
>both 32bit and 64bit systems.
>> Am I right?
>
>Isn't it already the same without any of the padding here?  Without the 
>padding in

RE: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-06-13 Thread Chen, Xiaoguang



>-Original Message-
>From: Alex Williamson [mailto:alex.william...@redhat.com]
>Sent: Wednesday, June 14, 2017 11:46 AM
>To: Chen, Xiaoguang 
>Cc: Tian, Kevin ; intel-...@lists.freedesktop.org; linux-
>ker...@vger.kernel.org; zhen...@linux.intel.com; ch...@chris-wilson.co.uk; Lv,
>Zhiyuan ; intel-gvt-...@lists.freedesktop.org; Wang, Zhi
>A ; kra...@redhat.com
>Subject: Re: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf operations
>
>On Wed, 14 Jun 2017 03:18:31 +
>"Chen, Xiaoguang"  wrote:
>
>> >-Original Message-
>> >From: intel-gvt-dev
>> >[mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On Behalf Of
>> >Alex Williamson
>> >Sent: Wednesday, June 14, 2017 11:06 AM
>> >To: Chen, Xiaoguang 
>> >Cc: Tian, Kevin ;
>> >intel-...@lists.freedesktop.org; linux- ker...@vger.kernel.org;
>> >zhen...@linux.intel.com; ch...@chris-wilson.co.uk; Lv, Zhiyuan
>> >; intel-gvt-...@lists.freedesktop.org; Wang,
>> >Zhi A ; kra...@redhat.com
>> >Subject: Re: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf
>> >operations
>> >
>> >On Wed, 14 Jun 2017 02:53:24 +
>> >"Chen, Xiaoguang"  wrote:
>> >
>> >> >-Original Message-
>> >> >From: Alex Williamson [mailto:alex.william...@redhat.com]
>> >> >Sent: Wednesday, June 14, 2017 5:25 AM
>> >> >To: Chen, Xiaoguang 
>> >> >Cc: kra...@redhat.com; ch...@chris-wilson.co.uk; intel-
>> >> >g...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>> >> >zhen...@linux.intel.com; Lv, Zhiyuan ;
>> >> >intel-gvt- d...@lists.freedesktop.org; Wang, Zhi A
>> >> >; Tian, Kevin 
>> >> >Subject: Re: [PATCH v8 4/6] vfio: Define vfio based vgpu's dma-buf
>> >> >operations
>> >> >
>> >> >On Fri,  9 Jun 2017 14:50:40 +0800 Xiaoguang Chen
>> >> > wrote:
>> >> >
>> >> >> Here we defined a new ioctl to create a fd for a vfio device
>> >> >> based on the input type. Now only one type is supported that is
>> >> >> a dma-buf management fd.
>> >> >> Two ioctls are defined for the dma-buf management fd: query the
>> >> >> vfio vgpu's plane information and create a dma-buf for a plane.
>> >> >>
>> >> >> Signed-off-by: Xiaoguang Chen 
>> >> >> ---
>> >> >>  include/uapi/linux/vfio.h | 58
>> >> >> +++
>> >> >>  1 file changed, 58 insertions(+)
>> >> >>
>> >> >> diff --git a/include/uapi/linux/vfio.h
>> >> >> b/include/uapi/linux/vfio.h index ae46105..24427b7 100644
>> >> >> --- a/include/uapi/linux/vfio.h
>> >> >> +++ b/include/uapi/linux/vfio.h
>> >> >> @@ -502,6 +502,64 @@ struct vfio_pci_hot_reset {
>> >> >>
>> >> >>  #define VFIO_DEVICE_PCI_HOT_RESET _IO(VFIO_TYPE, VFIO_BASE + 13)
>> >> >>
>> >> >> +/**
>> >> >> + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32)
>> >> >> + *
>> >> >> + * Create a fd for a vfio device based on the input type
>> >> >> + * Vendor driver should handle this ioctl to create a fd and
>> >> >> +manage the
>> >> >> + * life cycle of this fd.
>> >> >> + *
>> >> >> + * Return: a fd if vendor support that type, -errno if not
>> >> >> +supported */
>> >> >> +
>> >> >> +#define VFIO_DEVICE_GET_FD_IO(VFIO_TYPE, VFIO_BASE + 14)
>> >> >> +
>> >> >> +struct vfio_vgpu_plane_info {
>> >> >> +  __u64 start;
>> >> >> +  __u64 drm_format_mod;
>> >> >> +  __u32 drm_format;
>> >> >> +  __u32 width;
>> >> >> +  __u32 height;
>> >> >> +  __u32 stride;
>> >> >> +  __u32 size;
>> >> >> +  __u32 x_pos;
>> >> >> +  __u32 y_pos;
>> >> >> +  __u32 padding;
>> >> >> +};
>> >> >> +
>> >> >> +#define VFIO_DEVICE_DMABUF_MGR_FD 0 /* Supported fd types
>*/
>> >> >
>> >> >Move this #define up above vfio_vgpu_plane_info to associate it
>> >> >with the VFIO_DEVICE_GET_FD ioctl.
>> >> OK.
>> >>
>> >> >
>> >> >> +
>> >> >> +/*
>> >> >> + * VFIO_DEVICE_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15,
>> >> >> + *struct
>vfio_vgpu_query_plane)
>> >> >> + * Query plane information
>> >> >> + */
>> >> >> +struct vfio_vgpu_query_plane {
>> >> >> +  __u32 argsz;
>> >> >> +  __u32 flags;
>> >> >> +  struct vfio_vgpu_plane_info plane_info;
>> >> >> +  __u32 plane_id;
>> >> >> +  __u32 padding;
>> >> >
>> >> >This padding doesn't make sense.
>> >> This padding is still needed if we do not move the plane_id into
>> >vfio_vgpu_plane_info. Right?
>> >
>> >I don't see why this padding is ever needed, can you explain?
>> I thought we add the padding to make sure the structure layout is the same in
>both 32bit and 64bit systems.
>> Am I right?
>
>Isn't it already the same without any of the padding here?  Without the 
>padding in
>vfio_vgpu_plane_info it's 4-byte aligned and we're following it with a 4-byte 
>field,
>that works the same on 32 and 64bit.  Padding the outer structure here makes no
>sense to me.  Generally padding at the end of the structure is to allow 
>flexibility in
>expanding it within that padding without breaking the ioctl.  Here we use _IO 
>and
>argsz/flags to do that.  Thanks,
I got your point. Yes

Re: [PATCH] ip6_tunnel: Correct tos value in collect_md mode

2017-06-13 Thread Peter Dawson

On Wed, 14 Jun 2017 10:54:31 +0800
严海双  wrote:

> > Changes since v2:
> >  * mask key->tos with RT_TOS() suggested by Daniel

Can you help me understand the rationale for this change? Is there are bug 
introduced by dsfield = ip6_tclass(key->label); ?

The RT_TOS masks out 4bits of the 8bit tos field in accordance with RFC1349 
(obsoleted by RFC2474). IPv6 does not have a TOS field. So it dosen't make 
sense to apply a TOS value to the outer header of an IPv6 tunnel.

Re: [PATCH] ip6_tunnel: Correct tos value in collect_md mode

2017-06-13 Thread Peter Dawson

On Wed, 14 Jun 2017 10:54:31 +0800
严海双  wrote:

> > Changes since v2:
> >  * mask key->tos with RT_TOS() suggested by Daniel

Can you help me understand the rationale for this change? Is there are bug 
introduced by dsfield = ip6_tclass(key->label); ?

The RT_TOS masks out 4bits of the 8bit tos field in accordance with RFC1349 
(obsoleted by RFC2474). IPv6 does not have a TOS field. So it dosen't make 
sense to apply a TOS value to the outer header of an IPv6 tunnel.

Re: [RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-13 Thread Tony Lindgren

Hi,

* Hoeun Ryu  [170612 18:18]:
> --- a/arch/arm/include/debug/omap2plus.S
> +++ b/arch/arm/include/debug/omap2plus.S
> @@ -58,11 +58,22 @@
>  
>  #define UART_OFFSET(addr)((addr) & 0x00ff)
>  
> +/*
> + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
> + * Place the following block in .text section only when this file is
> + * included by arch/arm/boot/compressed/* to make it possible to
> + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in 
> arch/arm/boot/compressed/head.S
> + * on OMAP2+ SoCs.
> + */
> +#ifndef ZIMAGE
>   .pushsection .data
> +#endif
>  omap_uart_phys:  .word   0
>  omap_uart_virt:  .word   0
>  omap_uart_lsr:   .word   0
> +#ifndef ZIMAGE
>   .popsection
> +#endif

So I converted all these to use the 8250 debug_ll yesterday
which should solve the DEBUG_UNCOMPRESS issue for you and
allows us to remove this file. Will post the series shortly
for testing with you in Cc after I've done a bit more testing
here.

Regards,

Tony

Re: [RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-13 Thread Tony Lindgren

Hi,

* Hoeun Ryu  [170612 18:18]:
> --- a/arch/arm/include/debug/omap2plus.S
> +++ b/arch/arm/include/debug/omap2plus.S
> @@ -58,11 +58,22 @@
>  
>  #define UART_OFFSET(addr)((addr) & 0x00ff)
>  
> +/*
> + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
> + * Place the following block in .text section only when this file is
> + * included by arch/arm/boot/compressed/* to make it possible to
> + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in 
> arch/arm/boot/compressed/head.S
> + * on OMAP2+ SoCs.
> + */
> +#ifndef ZIMAGE
>   .pushsection .data
> +#endif
>  omap_uart_phys:  .word   0
>  omap_uart_virt:  .word   0
>  omap_uart_lsr:   .word   0
> +#ifndef ZIMAGE
>   .popsection
> +#endif

So I converted all these to use the 8250 debug_ll yesterday
which should solve the DEBUG_UNCOMPRESS issue for you and
allows us to remove this file. Will post the series shortly
for testing with you in Cc after I've done a bit more testing
here.

Regards,

Tony

Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-13 Thread Balbir Singh

On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singh  wrote:
>
>
> On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann 
> wrote:
>>
>> On a related note, we are discussing the addition of 2 new device-tree
>> properties
>> with Pete Heyrman and his fellows that should simplify the determination
>> of the
>> set of required nodes.
>>
>> * One property would provide the total/max number of nodes needed by the
>> kernel
>>   on the current hardware.
>
>

Yes, that would be nice to have

>
>>
>> * A second property would provide the total/max number of nodes that the
>> kernel
>>   could use on any system to which it could be migrated.
>>
>

Not sure about this one, are you suggesting more memory can be added
depending on the migration target?

>
>
>>
>> These properties aren't available, yet, and it takes time to define new
>> properties
>> in the PAPR and have them implemented in pHyp and the kernel.  As an
>> intermediary
>> step, the systems which are doing a lot of dynamic hot-add/hot-remove
>> configuration
>> could provide equivalent information to the PowerPC kernel with a command
>> line
>> parameter.  The 'numa.c' code would then read this value and fill in the
>> necessary
>> entries in the 'node_possible_map'.
>>
>> Would you foresee any problems with using such a feature?
>
>

Sorry my mailer goofed up, resending

Balbir Singh

Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

2017-06-13 Thread Balbir Singh

On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singh  wrote:
>
>
> On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann 
> wrote:
>>
>> On a related note, we are discussing the addition of 2 new device-tree
>> properties
>> with Pete Heyrman and his fellows that should simplify the determination
>> of the
>> set of required nodes.
>>
>> * One property would provide the total/max number of nodes needed by the
>> kernel
>>   on the current hardware.
>
>

Yes, that would be nice to have

>
>>
>> * A second property would provide the total/max number of nodes that the
>> kernel
>>   could use on any system to which it could be migrated.
>>
>

Not sure about this one, are you suggesting more memory can be added
depending on the migration target?

>
>
>>
>> These properties aren't available, yet, and it takes time to define new
>> properties
>> in the PAPR and have them implemented in pHyp and the kernel.  As an
>> intermediary
>> step, the systems which are doing a lot of dynamic hot-add/hot-remove
>> configuration
>> could provide equivalent information to the PowerPC kernel with a command
>> line
>> parameter.  The 'numa.c' code would then read this value and fill in the
>> necessary
>> entries in the 'node_possible_map'.
>>
>> Would you foresee any problems with using such a feature?
>
>

Sorry my mailer goofed up, resending

Balbir Singh

Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex

2017-06-13 Thread Binoy Jayan

Hi,

On 14 June 2017 at 01:55, Arnd Bergmann  wrote:

>> The mutex code clearly states mutex_trylock() must not be used in
>> interrupt context (see kernel/locking/mutex.c), hence we used a
>> semaphore here. Unless the mutex code is changed to allow this, we
>> cannot switch away from semaphores.
>
> Right, that makes a lot of sense. I don't think changing the mutex
> code is an option here, but I wonder if we can replace the semaphore
> with something simpler anyway.
>
> From what I can tell, it currently does two things:
>
> 1. it acts as a simple flag to prevent  hid_input_report from derefencing
> the hid->driver pointer during initialization and exit. I think this could
> be done equally well using a simple atomic set_bit()/test_bit() or 
> similar.
>
> 2. it prevents the hid->driver pointer from becoming invalid while an
> asynchronous hid_input_report() is in progress. This actually seems to
> be a reference counting problem rather than a locking problem.
> I don't immediately see how to better address it, or how exactly this
> could go wrong in practice, but I would naively expect that either
> hdev->driver->remove() needs to wait for the last user of hdev->driver
> to complete, or we need kref_get/kref_put in hid_input_report()
> to trigger the actual release function.

Thank you everyone for the comments. I'll resend the patch with Benjamin's
comments incorporated and address the changes in the second semaphore later.

Regards,
Binoy

Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex

2017-06-13 Thread Binoy Jayan

Hi,

On 14 June 2017 at 01:55, Arnd Bergmann  wrote:

>> The mutex code clearly states mutex_trylock() must not be used in
>> interrupt context (see kernel/locking/mutex.c), hence we used a
>> semaphore here. Unless the mutex code is changed to allow this, we
>> cannot switch away from semaphores.
>
> Right, that makes a lot of sense. I don't think changing the mutex
> code is an option here, but I wonder if we can replace the semaphore
> with something simpler anyway.
>
> From what I can tell, it currently does two things:
>
> 1. it acts as a simple flag to prevent  hid_input_report from derefencing
> the hid->driver pointer during initialization and exit. I think this could
> be done equally well using a simple atomic set_bit()/test_bit() or 
> similar.
>
> 2. it prevents the hid->driver pointer from becoming invalid while an
> asynchronous hid_input_report() is in progress. This actually seems to
> be a reference counting problem rather than a locking problem.
> I don't immediately see how to better address it, or how exactly this
> could go wrong in practice, but I would naively expect that either
> hdev->driver->remove() needs to wait for the last user of hdev->driver
> to complete, or we need kref_get/kref_put in hid_input_report()
> to trigger the actual release function.

Thank you everyone for the comments. I'll resend the patch with Benjamin's
comments incorporated and address the changes in the second semaphore later.

Regards,
Binoy

Re: [PATCH] Staging: vc04_services: bcm2835-audio: bcm2835-ctl.c: Fixed alignment to match open parenthesis.

2017-06-13 Thread Greg KH

On Wed, Jun 14, 2017 at 03:15:11AM +0530, srishti sharma wrote:
> On Tue, Jun 13, 2017 at 8:17 PM, Dan Carpenter  
> wrote:
> > On Tue, Jun 13, 2017 at 08:07:14PM +0530, srishti sharma wrote:
> >> On Tue, Jun 13, 2017 at 6:30 PM, Greg KH  
> >> wrote:
> >> > On Sat, Jun 10, 2017 at 02:37:22AM +0530, srishti sharma wrote:
> >> >> Fixed alignment so that it matched open parenthesis.
> >> >>
> >> >> Signed-off-by: srishti sharma 
> >> >> ---
> >> >>  drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c | 2 +-
> >> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c 
> >> >> b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c
> >> >> index f484bb0..2148ed0 100644
> >> >> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c
> >> >> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c
> >> >> @@ -105,7 +105,7 @@ static int snd_bcm2835_ctl_get(struct snd_kcontrol 
> >> >> *kcontrol,
> >> >>  }
> >> >>
> >> >>  static int snd_bcm2835_ctl_put(struct snd_kcontrol *kcontrol,
> >> >> - struct snd_ctl_elem_value *ucontrol)
> >> >> +struct snd_ctl_elem_value *ucontrol)
> >> >>  {
> >> >>   struct bcm2835_chip *chip = snd_kcontrol_chip(kcontrol);
> >> >>   int changed = 0;
> >> >
> >> > This patch is corrupted and can not be applied :(
> >>
> >>
> >> Why is this corrupted ?
> >
> > Try applying it with `git am`.  There should be space characters at the
> > start of these lines but your email client deleted them:
> >
> > struct bcm2835_chip *chip = snd_kcontrol_chip(kcontrol);
> > int changed = 0;
> >
> > Read Documentation/process/email-clients.rst
> >
> > regards,
> > dan carpenter
> 
> 
> Hello,
> 
> I tried applying it with ' git am ' and it was giving me this error:
> 
> fatal: corrupt patch at line XX

Exactly.

> I think this was produced because of running ./scripts/cleanfile on the patch 
> .

Why would you do that?  By doing that you corrupted the patch file :(

Just use the proper output of 'git format-patch' and all will be fine.

thanks,

greg k-h

Re: [PATCH] Staging: vc04_services: bcm2835-audio: bcm2835-ctl.c: Fixed alignment to match open parenthesis.

2017-06-13 Thread Greg KH

On Wed, Jun 14, 2017 at 03:15:11AM +0530, srishti sharma wrote:
> On Tue, Jun 13, 2017 at 8:17 PM, Dan Carpenter  
> wrote:
> > On Tue, Jun 13, 2017 at 08:07:14PM +0530, srishti sharma wrote:
> >> On Tue, Jun 13, 2017 at 6:30 PM, Greg KH  
> >> wrote:
> >> > On Sat, Jun 10, 2017 at 02:37:22AM +0530, srishti sharma wrote:
> >> >> Fixed alignment so that it matched open parenthesis.
> >> >>
> >> >> Signed-off-by: srishti sharma 
> >> >> ---
> >> >>  drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c | 2 +-
> >> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c 
> >> >> b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c
> >> >> index f484bb0..2148ed0 100644
> >> >> --- a/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c
> >> >> +++ b/drivers/staging/vc04_services/bcm2835-audio/bcm2835-ctl.c
> >> >> @@ -105,7 +105,7 @@ static int snd_bcm2835_ctl_get(struct snd_kcontrol 
> >> >> *kcontrol,
> >> >>  }
> >> >>
> >> >>  static int snd_bcm2835_ctl_put(struct snd_kcontrol *kcontrol,
> >> >> - struct snd_ctl_elem_value *ucontrol)
> >> >> +struct snd_ctl_elem_value *ucontrol)
> >> >>  {
> >> >>   struct bcm2835_chip *chip = snd_kcontrol_chip(kcontrol);
> >> >>   int changed = 0;
> >> >
> >> > This patch is corrupted and can not be applied :(
> >>
> >>
> >> Why is this corrupted ?
> >
> > Try applying it with `git am`.  There should be space characters at the
> > start of these lines but your email client deleted them:
> >
> > struct bcm2835_chip *chip = snd_kcontrol_chip(kcontrol);
> > int changed = 0;
> >
> > Read Documentation/process/email-clients.rst
> >
> > regards,
> > dan carpenter
> 
> 
> Hello,
> 
> I tried applying it with ' git am ' and it was giving me this error:
> 
> fatal: corrupt patch at line XX

Exactly.

> I think this was produced because of running ./scripts/cleanfile on the patch 
> .

Why would you do that?  By doing that you corrupted the patch file :(

Just use the proper output of 'git format-patch' and all will be fine.

thanks,

greg k-h

Re: [PATCH] staging: android: uapi: drop definitions of removed ION_IOC_{FREE,SHARE} ioctls

2017-06-13 Thread Greg Kroah-Hartman

On Tue, Jun 13, 2017 at 09:17:05PM +0300, Gleb Fotengauer-Malinovskiy wrote:
> On Tue, May 30, 2017 at 04:33:57PM -0700, Laura Abbott wrote:
> > On 05/30/2017 07:11 AM, Gleb Fotengauer-Malinovskiy wrote:
> > > This problem was found by strace ioctl list generator.
> > > 
> > > Fixes: 15c6098cfec5 ("staging: android: ion: Remove ion_handle and 
> > > ion_client")
> 
> As this commit fixes a regression, please apply it to the tree which will
> be merged before 4.12 release, too.

What "regression" is there?  The fact that a staging driver has a few
spare ioctls floating around in a header file?  How is that bad?

thanks,

greg k-h

Re: [PATCH] staging: android: uapi: drop definitions of removed ION_IOC_{FREE,SHARE} ioctls

2017-06-13 Thread Greg Kroah-Hartman

On Tue, Jun 13, 2017 at 09:17:05PM +0300, Gleb Fotengauer-Malinovskiy wrote:
> On Tue, May 30, 2017 at 04:33:57PM -0700, Laura Abbott wrote:
> > On 05/30/2017 07:11 AM, Gleb Fotengauer-Malinovskiy wrote:
> > > This problem was found by strace ioctl list generator.
> > > 
> > > Fixes: 15c6098cfec5 ("staging: android: ion: Remove ion_handle and 
> > > ion_client")
> 
> As this commit fixes a regression, please apply it to the tree which will
> be merged before 4.12 release, too.

What "regression" is there?  The fact that a staging driver has a few
spare ioctls floating around in a header file?  How is that bad?

thanks,

greg k-h

Re: [PATCH 1/1] uio: Fix uio_device memory leak

2017-06-13 Thread Greg KH

On Tue, Jun 13, 2017 at 07:35:51PM -0500, Mike Christie wrote:
> On 06/13/2017 07:16 PM, Mike Christie wrote:
> > On 06/13/2017 09:01 AM, Greg KH wrote:
> >> > On Wed, Jun 07, 2017 at 03:06:44PM -0500, Mike Christie wrote:
> >>> >> It looks like there might be 2 issues with the uio_device allocation, 
> >>> >> or it
> >>> >> looks like we are leaking the device for possibly a specific type of 
> >>> >> device
> >>> >> case that I could not find but one of you may know about.
> >>> >>
> >>> >> Issues:
> >>> >> 1. We use devm_kzalloc to allocate the uio_device, but the release
> >>> >> function, devm_kmalloc_release, is just a noop, so the memory is never 
> >>> >> freed.
> >> > 
> >> > What do you mean by this?  If the release function is a noop, lots of
> >> > memory in the kernel is leaking.  UIO shouldn't have to do anything
> >> > special here, is the devm api somehow broken?
> > Sorry. I misdiagnosed the problem. It's a noop, but we did kfree on the
> > entire devres and its data later.
> > 
> > The problem I was hitting is that memory is not freed until the parent
> > is removed. __uio_register_device does:
> > 
> > idev = devm_kzalloc(parent, sizeof(*idev), GFP_KERNEL);
> > if (!idev) {
> > return -ENOMEM;
> > }
> > 
> > so the devres's memory is associated with the parent. Is that intentional?
> > 
> 
> What I meant is that it I can send a patch to just fix up the
> devm_kzalloc use in uio.c, so it gets the device struct for the uio
> device instead of the parent.
> 
> However, it looks like the existing code using the parent prevents a
> crash. If the child is hot unplugged/removed, and uio_unregister_device
> ends up freeing the idev, then later when the userspace application does
> a close on the uio device we would try to access the freed idev in
> uio_release.
> 
> If the devm_kzalloc parent use was meant for that hot unplug case, then
> I can also look into how to fix the drivers too.

Yeah, I don't know why it is tied to the parent, I'll take a patch to
fix that and let's see what breaks :)

thanks,

greg k-h

Re: [PATCH 1/1] uio: Fix uio_device memory leak

2017-06-13 Thread Greg KH

On Tue, Jun 13, 2017 at 07:35:51PM -0500, Mike Christie wrote:
> On 06/13/2017 07:16 PM, Mike Christie wrote:
> > On 06/13/2017 09:01 AM, Greg KH wrote:
> >> > On Wed, Jun 07, 2017 at 03:06:44PM -0500, Mike Christie wrote:
> >>> >> It looks like there might be 2 issues with the uio_device allocation, 
> >>> >> or it
> >>> >> looks like we are leaking the device for possibly a specific type of 
> >>> >> device
> >>> >> case that I could not find but one of you may know about.
> >>> >>
> >>> >> Issues:
> >>> >> 1. We use devm_kzalloc to allocate the uio_device, but the release
> >>> >> function, devm_kmalloc_release, is just a noop, so the memory is never 
> >>> >> freed.
> >> > 
> >> > What do you mean by this?  If the release function is a noop, lots of
> >> > memory in the kernel is leaking.  UIO shouldn't have to do anything
> >> > special here, is the devm api somehow broken?
> > Sorry. I misdiagnosed the problem. It's a noop, but we did kfree on the
> > entire devres and its data later.
> > 
> > The problem I was hitting is that memory is not freed until the parent
> > is removed. __uio_register_device does:
> > 
> > idev = devm_kzalloc(parent, sizeof(*idev), GFP_KERNEL);
> > if (!idev) {
> > return -ENOMEM;
> > }
> > 
> > so the devres's memory is associated with the parent. Is that intentional?
> > 
> 
> What I meant is that it I can send a patch to just fix up the
> devm_kzalloc use in uio.c, so it gets the device struct for the uio
> device instead of the parent.
> 
> However, it looks like the existing code using the parent prevents a
> crash. If the child is hot unplugged/removed, and uio_unregister_device
> ends up freeing the idev, then later when the userspace application does
> a close on the uio device we would try to access the freed idev in
> uio_release.
> 
> If the devm_kzalloc parent use was meant for that hot unplug case, then
> I can also look into how to fix the drivers too.

Yeah, I don't know why it is tied to the parent, I'll take a patch to
fix that and let's see what breaks :)

thanks,

greg k-h

Re: [Merge tag 'pci-v4.12-changes' of git] 857f864014: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

2017-06-13 Thread Greg Kroah-Hartman

On Tue, Jun 13, 2017 at 09:13:32PM +0200, Sven-Haegar Koch wrote:
> On Tue, 13 Jun 2017, Greg Kroah-Hartman wrote:
> 
> > On Tue, Jun 13, 2017 at 10:25:40AM -0600, Logan Gunthorpe wrote:
> > > 
> > > On 12/06/17 10:35 PM, Greg Kroah-Hartman wrote:
> > > > Or better yet, just turn all char major allocations into dynamic, which
> > > > would be really good for test systems.  I thought someone proposed
> > > > patches for that a long time ago, but I can't find them anymore.  That
> > > > would be the simplest solution here.
> > > 
> > > Would people not complain about that? I would not be surprised if some
> > > crazy application is using hard coded major numbers in userspace. So
> > > such a change could potentially break userspace...
> > 
> > For char devices, I doubt it, but we can't take the chance, which is why
> > you make it an option.  Then, it's enabled for 'allmodconfig' builds,
> > which helps testers out.
> 
> At least for /dev/null, /dev/zero, and perhaps /dev/tty it would 
> definitely break things if the major+minor number is not static. I have 
> multiple chroot environments having only some minimal needed static 
> /dev subdir, with naturally no daemons or filesystem creating those 
> on-demand. For the main /dev I use whatever the system sets up, so 
> devtmpfs with udev.

No, it wouldn't be required, it would be an option for those people
using devtmpfs.

thanks,

greg k-h

Re: [Merge tag 'pci-v4.12-changes' of git] 857f864014: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

2017-06-13 Thread Greg Kroah-Hartman

On Tue, Jun 13, 2017 at 09:13:32PM +0200, Sven-Haegar Koch wrote:
> On Tue, 13 Jun 2017, Greg Kroah-Hartman wrote:
> 
> > On Tue, Jun 13, 2017 at 10:25:40AM -0600, Logan Gunthorpe wrote:
> > > 
> > > On 12/06/17 10:35 PM, Greg Kroah-Hartman wrote:
> > > > Or better yet, just turn all char major allocations into dynamic, which
> > > > would be really good for test systems.  I thought someone proposed
> > > > patches for that a long time ago, but I can't find them anymore.  That
> > > > would be the simplest solution here.
> > > 
> > > Would people not complain about that? I would not be surprised if some
> > > crazy application is using hard coded major numbers in userspace. So
> > > such a change could potentially break userspace...
> > 
> > For char devices, I doubt it, but we can't take the chance, which is why
> > you make it an option.  Then, it's enabled for 'allmodconfig' builds,
> > which helps testers out.
> 
> At least for /dev/null, /dev/zero, and perhaps /dev/tty it would 
> definitely break things if the major+minor number is not static. I have 
> multiple chroot environments having only some minimal needed static 
> /dev subdir, with naturally no daemons or filesystem creating those 
> on-demand. For the main /dev I use whatever the system sets up, so 
> devtmpfs with udev.

No, it wouldn't be required, it would be an option for those people
using devtmpfs.

thanks,

greg k-h

Re: [Merge tag 'pci-v4.12-changes' of git] 857f864014: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

2017-06-13 Thread Greg Kroah-Hartman

On Tue, Jun 13, 2017 at 11:47:30AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 13/06/17 10:35 AM, Greg Kroah-Hartman wrote:
> > For char devices, I doubt it, but we can't take the chance, which is why
> > you make it an option.  Then, it's enabled for 'allmodconfig' builds,
> > which helps testers out.
> 
> Well I took a look at this and it looks like a lot of work to modify all
> the drivers to support a possible dynamic allocation and I'm not really
> able to do all that right now.

No, don't modify any drivers, do this in the core chardev code.

> However, correct me if I'm missing something, but it looks fairly
> straightforward to extend the dynamic region above 256 in cases like
> this. There are already fixed major numbers above 255 and the
> infrastructure appears to support it. So what are your thoughts on the
> change below? I'd be happy to clean it up into a proper patch if you
> agree it's a workable option.
> 
> Thanks,
> 
> Logan
> 
> 
> 
> diff --git a/fs/char_dev.c b/fs/char_dev.c
> index fb8507f521b2..539352425d95 100644
> --- a/fs/char_dev.c
> +++ b/fs/char_dev.c
> @@ -59,6 +59,29 @@ void chrdev_show(struct seq_file *f, off_t offset)
> 
>  #endif /* CONFIG_PROC_FS */
> 
> +static int find_dynamic_major(void)
> +{
> + int i;
> + struct char_device_struct **cp;
> +
> + for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
> + if (chrdevs[i] == NULL)
> + return i;
> + }
> +
> + for (i = CHRDEV_MAJOR_DYN_EXT_START;
> +  i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
> + for (cp = [major_to_index(i)]; *cp; cp = &(*cp)->next)
> + if ((*cp)->major == i)
> + break;
> +
> + if (*cp == NULL || (*cp)->major != i)
> + return i;
> + }
> +
> + return -EBUSY;
> +}
> +
>  /*
>   * Register a single major with a specified minor range.
>   *
> @@ -84,22 +107,11 @@ __register_chrdev_region(unsigned int major,
> unsigned int baseminor,
> 
>   mutex_lock(_lock);
> 
> - /* temporary */
>   if (major == 0) {
> - for (i = ARRAY_SIZE(chrdevs)-1; i > 0; i--) {
> - if (chrdevs[i] == NULL)
> - break;
> - }
> -
> - if (i < CHRDEV_MAJOR_DYN_END)
> - pr_warn("CHRDEV \"%s\" major number %d goes below the 
> dynamic
> allocation range\n",
> - name, i);
> -
> - if (i == 0) {
> - ret = -EBUSY;
> + ret = find_dynamic_major();
> + if (ret < 0)
>   goto out;
> - }
> - major = i;
> + major = ret;
>   }
> 
>   cd->major = major;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 249dad4e8d26..5780d69034ca 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2448,6 +2448,9 @@ static inline void bd_unlink_disk_holder(struct
> block_device *bdev,
>  #define CHRDEV_MAJOR_HASH_SIZE   255
>  /* Marks the bottom of the first segment of free char majors */
>  #define CHRDEV_MAJOR_DYN_END 234
> +#define CHRDEV_MAJOR_DYN_EXT_START 511
> +#define CHRDEV_MAJOR_DYN_EXT_END 384
> +
>  extern int alloc_chrdev_region(dev_t *, unsigned, unsigned, const char *);
>  extern int register_chrdev_region(dev_t, unsigned, const char *);
>  extern int __register_chrdev(unsigned int major, unsigned int baseminor,
> 
> 
> 
> 
> 
> This results in char devices like this (another patch might be prudent
> to fix the ordering):
> 
> Character devices:
> 510 ttySLM
>   1 mem
> 511 noz
>   4 /dev/vc/0
>   4 tty
>   4 ttyS
>   5 /dev/tty
>   5 /dev/console
>   5 /dev/ptmx
>   7 vcs
>   9 st
>  10 misc
>  13 input
>  21 sg
>  29 fb
>  43 ttyI
>  45 isdn
>  68 capi20
>  86 ch
>  90 mtd
>  99 ppdev
> 108 ppp
> 128 ptm
> 136 pts
> 152 aoechr
> 153 spi
> 161 ircomm
> 172 ttyMX
> 174 ttyMI
> 202 cpu/msr
> 203 cpu/cpuid
> 204 ttyJ
> 204 ttyMAX
> 206 osst
> 226 drm
> 235 ttyS
> 236 ttyRP
> 237 ttyARC
> 238 ttyPS
> 239 ttyIFX
> 494 rio_cm
> 240 ttySC
> 495 cros_ec
> 241 ipmidev
> 496 hidraw
> 242 rio_mport
> 497 ttySDIO
> 243 xz_dec_test
> 498 uio
> 244 bsg
> 499 firewire
> 245 pvfs2-req
> 500 nvme
> 246 watchdog
> 501 aac
> 247 iio
> 502 mei
> 248 ptp
> 503 phantom
> 249 pps
> 504 aux
> 250 dax
> 505 cmx
> 251 dimmctl
> 506 cmm
> 252 ndctl
> 507 ttySLP
> 253 tpm
> 508 ttyIPWp
> 254 gpiochip
> 509 ttySL
> 

At quick glance, ooks good to me, care to clean it up and make it behind
a config option?

thanks,

greg k-h

Re: [Merge tag 'pci-v4.12-changes' of git] 857f864014: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

2017-06-13 Thread Greg Kroah-Hartman

On Tue, Jun 13, 2017 at 11:47:30AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 13/06/17 10:35 AM, Greg Kroah-Hartman wrote:
> > For char devices, I doubt it, but we can't take the chance, which is why
> > you make it an option.  Then, it's enabled for 'allmodconfig' builds,
> > which helps testers out.
> 
> Well I took a look at this and it looks like a lot of work to modify all
> the drivers to support a possible dynamic allocation and I'm not really
> able to do all that right now.

No, don't modify any drivers, do this in the core chardev code.

> However, correct me if I'm missing something, but it looks fairly
> straightforward to extend the dynamic region above 256 in cases like
> this. There are already fixed major numbers above 255 and the
> infrastructure appears to support it. So what are your thoughts on the
> change below? I'd be happy to clean it up into a proper patch if you
> agree it's a workable option.
> 
> Thanks,
> 
> Logan
> 
> 
> 
> diff --git a/fs/char_dev.c b/fs/char_dev.c
> index fb8507f521b2..539352425d95 100644
> --- a/fs/char_dev.c
> +++ b/fs/char_dev.c
> @@ -59,6 +59,29 @@ void chrdev_show(struct seq_file *f, off_t offset)
> 
>  #endif /* CONFIG_PROC_FS */
> 
> +static int find_dynamic_major(void)
> +{
> + int i;
> + struct char_device_struct **cp;
> +
> + for (i = ARRAY_SIZE(chrdevs)-1; i > CHRDEV_MAJOR_DYN_END; i--) {
> + if (chrdevs[i] == NULL)
> + return i;
> + }
> +
> + for (i = CHRDEV_MAJOR_DYN_EXT_START;
> +  i > CHRDEV_MAJOR_DYN_EXT_END; i--) {
> + for (cp = [major_to_index(i)]; *cp; cp = &(*cp)->next)
> + if ((*cp)->major == i)
> + break;
> +
> + if (*cp == NULL || (*cp)->major != i)
> + return i;
> + }
> +
> + return -EBUSY;
> +}
> +
>  /*
>   * Register a single major with a specified minor range.
>   *
> @@ -84,22 +107,11 @@ __register_chrdev_region(unsigned int major,
> unsigned int baseminor,
> 
>   mutex_lock(_lock);
> 
> - /* temporary */
>   if (major == 0) {
> - for (i = ARRAY_SIZE(chrdevs)-1; i > 0; i--) {
> - if (chrdevs[i] == NULL)
> - break;
> - }
> -
> - if (i < CHRDEV_MAJOR_DYN_END)
> - pr_warn("CHRDEV \"%s\" major number %d goes below the 
> dynamic
> allocation range\n",
> - name, i);
> -
> - if (i == 0) {
> - ret = -EBUSY;
> + ret = find_dynamic_major();
> + if (ret < 0)
>   goto out;
> - }
> - major = i;
> + major = ret;
>   }
> 
>   cd->major = major;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 249dad4e8d26..5780d69034ca 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2448,6 +2448,9 @@ static inline void bd_unlink_disk_holder(struct
> block_device *bdev,
>  #define CHRDEV_MAJOR_HASH_SIZE   255
>  /* Marks the bottom of the first segment of free char majors */
>  #define CHRDEV_MAJOR_DYN_END 234
> +#define CHRDEV_MAJOR_DYN_EXT_START 511
> +#define CHRDEV_MAJOR_DYN_EXT_END 384
> +
>  extern int alloc_chrdev_region(dev_t *, unsigned, unsigned, const char *);
>  extern int register_chrdev_region(dev_t, unsigned, const char *);
>  extern int __register_chrdev(unsigned int major, unsigned int baseminor,
> 
> 
> 
> 
> 
> This results in char devices like this (another patch might be prudent
> to fix the ordering):
> 
> Character devices:
> 510 ttySLM
>   1 mem
> 511 noz
>   4 /dev/vc/0
>   4 tty
>   4 ttyS
>   5 /dev/tty
>   5 /dev/console
>   5 /dev/ptmx
>   7 vcs
>   9 st
>  10 misc
>  13 input
>  21 sg
>  29 fb
>  43 ttyI
>  45 isdn
>  68 capi20
>  86 ch
>  90 mtd
>  99 ppdev
> 108 ppp
> 128 ptm
> 136 pts
> 152 aoechr
> 153 spi
> 161 ircomm
> 172 ttyMX
> 174 ttyMI
> 202 cpu/msr
> 203 cpu/cpuid
> 204 ttyJ
> 204 ttyMAX
> 206 osst
> 226 drm
> 235 ttyS
> 236 ttyRP
> 237 ttyARC
> 238 ttyPS
> 239 ttyIFX
> 494 rio_cm
> 240 ttySC
> 495 cros_ec
> 241 ipmidev
> 496 hidraw
> 242 rio_mport
> 497 ttySDIO
> 243 xz_dec_test
> 498 uio
> 244 bsg
> 499 firewire
> 245 pvfs2-req
> 500 nvme
> 246 watchdog
> 501 aac
> 247 iio
> 502 mei
> 248 ptp
> 503 phantom
> 249 pps
> 504 aux
> 250 dax
> 505 cmx
> 251 dimmctl
> 506 cmm
> 252 ndctl
> 507 ttySLP
> 253 tpm
> 508 ttyIPWp
> 254 gpiochip
> 509 ttySL
> 

At quick glance, ooks good to me, care to clean it up and make it behind
a config option?

thanks,

greg k-h

[PATCH v2 01/10] x86/ldt: Simplify LDT switching logic

2017-06-13 Thread Andy Lutomirski

Originally, Linux reloaded the LDT whenever the prev mm or the next
mm had an LDT.  It was changed in 0bbed3beb4f2 ("[PATCH]
Thread-Local Storage (TLS) support") (from the historical tree) like
this:

-   /* load_LDT, if either the previous or next thread
-* has a non-default LDT.
+   /*
+* load the LDT, if the LDT is different:
 */
-   if (next->context.size+prev->context.size)
+   if (unlikely(prev->context.ldt != next->context.ldt))
load_LDT(>context);

The current code is unlikely to avoid any LDT reloads, since different
mms won't share an LDT.

When we redo lazy mode to stop flush IPIs without switching to
init_mm, though, the current logic would become incorrect: it will
be possible to have real_prev == next but nonetheless have a stale
LDT descriptor.

Simplify the code to update LDTR if either the previous or the next
mm has an LDT, i.e. effectively restore the historical logic..
While we're at it, clean up the code by moving all the ifdeffery to
a header where it belongs.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu_context.h | 26 ++
 arch/x86/mm/tlb.c  | 20 ++--
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 1458f530948b..ecfcb6643c9b 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -93,6 +93,32 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 #else
clear_LDT();
 #endif
+}
+
+static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
+{
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   /*
+* Load the LDT if either the old or new mm had an LDT.
+*
+* An mm will never go from having an LDT to not having an LDT.  Two
+* mms never share an LDT, so we don't gain anything by checking to
+* see whether the LDT changed.  There's also no guarantee that
+* prev->context.ldt actually matches LDTR, but, if LDTR is non-NULL,
+* then prev->context.ldt will also be non-NULL.
+*
+* If we really cared, we could optimize the case where prev == next
+* and we're exiting lazy mode.  Most of the time, if this happens,
+* we don't actually need to reload LDTR, but modify_ldt() is mostly
+* used by legacy code and emulators where we don't need this level of
+* performance.
+*
+* This uses | instead of || because it generates better code.
+*/
+   if (unlikely((unsigned long)prev->context.ldt |
+(unsigned long)next->context.ldt))
+   load_mm_ldt(next);
+#endif
 
DEBUG_LOCKS_WARN_ON(preemptible());
 }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 2a5e851f2035..b2485d69f7c2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -148,25 +148,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 real_prev != _mm);
cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
 
-   /* Load per-mm CR4 state */
+   /* Load per-mm CR4 and LDTR state */
load_mm_cr4(next);
-
-#ifdef CONFIG_MODIFY_LDT_SYSCALL
-   /*
-* Load the LDT, if the LDT is different.
-*
-* It's possible that prev->context.ldt doesn't match
-* the LDT register.  This can happen if leave_mm(prev)
-* was called and then modify_ldt changed
-* prev->context.ldt but suppressed an IPI to this CPU.
-* In this case, prev->context.ldt != NULL, because we
-* never set context.ldt to NULL while the mm still
-* exists.  That means that next->context.ldt !=
-* prev->context.ldt, because mms never share an LDT.
-*/
-   if (unlikely(real_prev->context.ldt != next->context.ldt))
-   load_mm_ldt(next);
-#endif
+   switch_ldt(real_prev, next);
 }
 
 /*
-- 
2.9.4

[PATCH v2 01/10] x86/ldt: Simplify LDT switching logic

2017-06-13 Thread Andy Lutomirski

Originally, Linux reloaded the LDT whenever the prev mm or the next
mm had an LDT.  It was changed in 0bbed3beb4f2 ("[PATCH]
Thread-Local Storage (TLS) support") (from the historical tree) like
this:

-   /* load_LDT, if either the previous or next thread
-* has a non-default LDT.
+   /*
+* load the LDT, if the LDT is different:
 */
-   if (next->context.size+prev->context.size)
+   if (unlikely(prev->context.ldt != next->context.ldt))
load_LDT(>context);

The current code is unlikely to avoid any LDT reloads, since different
mms won't share an LDT.

When we redo lazy mode to stop flush IPIs without switching to
init_mm, though, the current logic would become incorrect: it will
be possible to have real_prev == next but nonetheless have a stale
LDT descriptor.

Simplify the code to update LDTR if either the previous or the next
mm has an LDT, i.e. effectively restore the historical logic..
While we're at it, clean up the code by moving all the ifdeffery to
a header where it belongs.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu_context.h | 26 ++
 arch/x86/mm/tlb.c  | 20 ++--
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 1458f530948b..ecfcb6643c9b 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -93,6 +93,32 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 #else
clear_LDT();
 #endif
+}
+
+static inline void switch_ldt(struct mm_struct *prev, struct mm_struct *next)
+{
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+   /*
+* Load the LDT if either the old or new mm had an LDT.
+*
+* An mm will never go from having an LDT to not having an LDT.  Two
+* mms never share an LDT, so we don't gain anything by checking to
+* see whether the LDT changed.  There's also no guarantee that
+* prev->context.ldt actually matches LDTR, but, if LDTR is non-NULL,
+* then prev->context.ldt will also be non-NULL.
+*
+* If we really cared, we could optimize the case where prev == next
+* and we're exiting lazy mode.  Most of the time, if this happens,
+* we don't actually need to reload LDTR, but modify_ldt() is mostly
+* used by legacy code and emulators where we don't need this level of
+* performance.
+*
+* This uses | instead of || because it generates better code.
+*/
+   if (unlikely((unsigned long)prev->context.ldt |
+(unsigned long)next->context.ldt))
+   load_mm_ldt(next);
+#endif
 
DEBUG_LOCKS_WARN_ON(preemptible());
 }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 2a5e851f2035..b2485d69f7c2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -148,25 +148,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 real_prev != _mm);
cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
 
-   /* Load per-mm CR4 state */
+   /* Load per-mm CR4 and LDTR state */
load_mm_cr4(next);
-
-#ifdef CONFIG_MODIFY_LDT_SYSCALL
-   /*
-* Load the LDT, if the LDT is different.
-*
-* It's possible that prev->context.ldt doesn't match
-* the LDT register.  This can happen if leave_mm(prev)
-* was called and then modify_ldt changed
-* prev->context.ldt but suppressed an IPI to this CPU.
-* In this case, prev->context.ldt != NULL, because we
-* never set context.ldt to NULL while the mm still
-* exists.  That means that next->context.ldt !=
-* prev->context.ldt, because mms never share an LDT.
-*/
-   if (unlikely(real_prev->context.ldt != next->context.ldt))
-   load_mm_ldt(next);
-#endif
+   switch_ldt(real_prev, next);
 }
 
 /*
-- 
2.9.4

[PATCH v2 00/10] PCID and improved laziness

2017-06-13 Thread Andy Lutomirski

There are three performance benefits here:

1. TLB flushing is slow.  (I.e. the flush itself takes a while.)
   This avoids many of them when switching tasks by using PCID.  In
   a stupid little benchmark I did, it saves about 100ns on my laptop
   per context switch.  I'll try to improve that benchmark.

2. Mms that have been used recently on a given CPU might get to keep
   their TLB entries alive across process switches with this patch
   set.  TLB fills are pretty fast on modern CPUs, but they're even
   faster when they don't happen.

3. Lazy TLB is way better.  We used to do two stupid things when we
   ran kernel threads: we'd send IPIs to flush user contexts on their
   CPUs and then we'd write to CR3 for no particular reason as an excuse
   to stop further IPIs.  With this patch, we do neither.

This will, in general, perform suboptimally if paravirt TLB flushing
is in use (currently just Xen, I think, but Hyper-V is in the works).
The code is structured so we could fix it in one of two ways: we
could take a spinlock when touching the percpu state so we can update
it remotely after a paravirt flush, or we could be more careful about
our exactly how we access the state and use cmpxchg16b to do atomic
remote updates.  (On SMP systems without cmpxchg16b, we'd just skip
the optimization entirely.)

This is based on tip:x86/mm.  The branch is here if you want to play:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/pcid

Changes from RFC:
 - flush_tlb_func_common() no longer gets reentered (Nadav)
 - Fix ASID corruption on unlazying (kbuild bot)
 - Move Xen init to the right place
 - Misc cleanups

Andy Lutomirski (10):
  x86/ldt: Simplify LDT switching logic
  x86/mm: Remove reset_lazy_tlbstate()
  x86/mm: Give each mm TLB flush generation a unique ID
  x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
  x86/mm: Rework lazy TLB mode and TLB freshness tracking
  x86/mm: Stop calling leave_mm() in idle code
  x86/mm: Disable PCID on 32-bit kernels
  x86/mm: Add nopcid to turn off PCID
  x86/mm: Enable CR4.PCIDE on supported systems
  x86/mm: Try to preserve old TLB entries using PCID

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 arch/ia64/include/asm/acpi.h|   2 -
 arch/x86/include/asm/acpi.h |   2 -
 arch/x86/include/asm/disabled-features.h|   4 +-
 arch/x86/include/asm/mmu.h  |  25 +-
 arch/x86/include/asm/mmu_context.h  |  40 ++-
 arch/x86/include/asm/processor-flags.h  |   2 +
 arch/x86/include/asm/tlbflush.h |  89 +-
 arch/x86/kernel/cpu/bugs.c  |   8 +
 arch/x86/kernel/cpu/common.c|  33 +++
 arch/x86/kernel/smpboot.c   |   1 -
 arch/x86/mm/init.c  |   2 +-
 arch/x86/mm/tlb.c   | 368 +++-
 arch/x86/xen/enlighten_pv.c |   6 +
 drivers/acpi/processor_idle.c   |   2 -
 drivers/idle/intel_idle.c   |   9 +-
 16 files changed, 429 insertions(+), 166 deletions(-)

-- 
2.9.4

[PATCH v2 00/10] PCID and improved laziness

2017-06-13 Thread Andy Lutomirski

There are three performance benefits here:

1. TLB flushing is slow.  (I.e. the flush itself takes a while.)
   This avoids many of them when switching tasks by using PCID.  In
   a stupid little benchmark I did, it saves about 100ns on my laptop
   per context switch.  I'll try to improve that benchmark.

2. Mms that have been used recently on a given CPU might get to keep
   their TLB entries alive across process switches with this patch
   set.  TLB fills are pretty fast on modern CPUs, but they're even
   faster when they don't happen.

3. Lazy TLB is way better.  We used to do two stupid things when we
   ran kernel threads: we'd send IPIs to flush user contexts on their
   CPUs and then we'd write to CR3 for no particular reason as an excuse
   to stop further IPIs.  With this patch, we do neither.

This will, in general, perform suboptimally if paravirt TLB flushing
is in use (currently just Xen, I think, but Hyper-V is in the works).
The code is structured so we could fix it in one of two ways: we
could take a spinlock when touching the percpu state so we can update
it remotely after a paravirt flush, or we could be more careful about
our exactly how we access the state and use cmpxchg16b to do atomic
remote updates.  (On SMP systems without cmpxchg16b, we'd just skip
the optimization entirely.)

This is based on tip:x86/mm.  The branch is here if you want to play:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/pcid

Changes from RFC:
 - flush_tlb_func_common() no longer gets reentered (Nadav)
 - Fix ASID corruption on unlazying (kbuild bot)
 - Move Xen init to the right place
 - Misc cleanups

Andy Lutomirski (10):
  x86/ldt: Simplify LDT switching logic
  x86/mm: Remove reset_lazy_tlbstate()
  x86/mm: Give each mm TLB flush generation a unique ID
  x86/mm: Track the TLB's tlb_gen and update the flushing algorithm
  x86/mm: Rework lazy TLB mode and TLB freshness tracking
  x86/mm: Stop calling leave_mm() in idle code
  x86/mm: Disable PCID on 32-bit kernels
  x86/mm: Add nopcid to turn off PCID
  x86/mm: Enable CR4.PCIDE on supported systems
  x86/mm: Try to preserve old TLB entries using PCID

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 arch/ia64/include/asm/acpi.h|   2 -
 arch/x86/include/asm/acpi.h |   2 -
 arch/x86/include/asm/disabled-features.h|   4 +-
 arch/x86/include/asm/mmu.h  |  25 +-
 arch/x86/include/asm/mmu_context.h  |  40 ++-
 arch/x86/include/asm/processor-flags.h  |   2 +
 arch/x86/include/asm/tlbflush.h |  89 +-
 arch/x86/kernel/cpu/bugs.c  |   8 +
 arch/x86/kernel/cpu/common.c|  33 +++
 arch/x86/kernel/smpboot.c   |   1 -
 arch/x86/mm/init.c  |   2 +-
 arch/x86/mm/tlb.c   | 368 +++-
 arch/x86/xen/enlighten_pv.c |   6 +
 drivers/acpi/processor_idle.c   |   2 -
 drivers/idle/intel_idle.c   |   9 +-
 16 files changed, 429 insertions(+), 166 deletions(-)

-- 
2.9.4

[PATCH v2 03/10] x86/mm: Give each mm TLB flush generation a unique ID

2017-06-13 Thread Andy Lutomirski

This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
ctx_id uniquely identifies the mm_struct and will never be reused.
For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
count of the number of times that a TLB flush has been requested.
The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
flush actions and will be used in subsequent patches to reliably
determine whether all needed TLB flushes have occurred on a given
CPU.

This patch is split out for ease of review.  By itself, it has no
real effect other than creating and updating the new variables.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu.h | 25 +++--
 arch/x86/include/asm/mmu_context.h |  5 +
 arch/x86/include/asm/tlbflush.h| 18 ++
 arch/x86/mm/tlb.c  |  6 --
 4 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 79b647a7ebd0..bb8c597c2248 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -3,12 +3,28 @@
 
 #include 
 #include 
+#include 
 
 /*
- * The x86 doesn't have a mmu context, but
- * we put the segment information here.
+ * x86 has arch-specific MMU state beyond what lives in mm_struct.
  */
 typedef struct {
+   /*
+* ctx_id uniquely identifies this mm_struct.  A ctx_id will never
+* be reused, and zero is not a valid ctx_id.
+*/
+   u64 ctx_id;
+
+   /*
+* Any code that needs to do any sort of TLB flushing for this
+* mm will first make its changes to the page tables, then
+* increment tlb_gen, then flush.  This lets the low-level
+* flushing code keep track of what needs flushing.
+*
+* This is not used on Xen PV.
+*/
+   atomic64_t tlb_gen;
+
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
 #endif
@@ -37,6 +53,11 @@ typedef struct {
 #endif
 } mm_context_t;
 
+#define INIT_MM_CONTEXT(mm)\
+   .context = {\
+   .ctx_id = 1,\
+   }
+
 void leave_mm(int cpu);
 
 #endif /* _ASM_X86_MMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index ecfcb6643c9b..e5295d485899 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -129,9 +129,14 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
 }
 
+extern atomic64_t last_mm_ctx_id;
+
 static inline int init_new_context(struct task_struct *tsk,
   struct mm_struct *mm)
 {
+   mm->context.ctx_id = atomic64_inc_return(_mm_ctx_id);
+   atomic64_set(>context.tlb_gen, 0);
+
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
/* pkey 0 is the default and always allocated */
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 50ea3482e1d1..1eb946c0507e 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -57,6 +57,23 @@ static inline void invpcid_flush_all_nonglobals(void)
__invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
 }
 
+static inline u64 bump_mm_tlb_gen(struct mm_struct *mm)
+{
+   u64 new_tlb_gen;
+
+   /*
+* Bump the generation count.  This also serves as a full barrier
+* that synchronizes with switch_mm: callers are required to order
+* their read of mm_cpumask after their writes to the paging
+* structures.
+*/
+   smp_mb__before_atomic();
+   new_tlb_gen = atomic64_inc_return(>context.tlb_gen);
+   smp_mb__after_atomic();
+
+   return new_tlb_gen;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include 
 #else
@@ -262,6 +279,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
+   bump_mm_tlb_gen(mm);
cpumask_or(>cpumask, >cpumask, mm_cpumask(mm));
 }
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index b2485d69f7c2..7c99c50e8bc9 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -28,6 +28,8 @@
  * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
  */
 
+atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
+
 void leave_mm(int cpu)
 {
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
@@ -283,8 +285,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long 
start,
 
cpu = get_cpu();
 
-   /* Synchronize with switch_mm. */
-   smp_mb();
+   /* This is also a barrier that synchronizes with switch_mm(). */
+   bump_mm_tlb_gen(mm);
 
/*

[PATCH v2 03/10] x86/mm: Give each mm TLB flush generation a unique ID

2017-06-13 Thread Andy Lutomirski

This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
ctx_id uniquely identifies the mm_struct and will never be reused.
For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
count of the number of times that a TLB flush has been requested.
The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
flush actions and will be used in subsequent patches to reliably
determine whether all needed TLB flushes have occurred on a given
CPU.

This patch is split out for ease of review.  By itself, it has no
real effect other than creating and updating the new variables.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu.h | 25 +++--
 arch/x86/include/asm/mmu_context.h |  5 +
 arch/x86/include/asm/tlbflush.h| 18 ++
 arch/x86/mm/tlb.c  |  6 --
 4 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 79b647a7ebd0..bb8c597c2248 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -3,12 +3,28 @@
 
 #include 
 #include 
+#include 
 
 /*
- * The x86 doesn't have a mmu context, but
- * we put the segment information here.
+ * x86 has arch-specific MMU state beyond what lives in mm_struct.
  */
 typedef struct {
+   /*
+* ctx_id uniquely identifies this mm_struct.  A ctx_id will never
+* be reused, and zero is not a valid ctx_id.
+*/
+   u64 ctx_id;
+
+   /*
+* Any code that needs to do any sort of TLB flushing for this
+* mm will first make its changes to the page tables, then
+* increment tlb_gen, then flush.  This lets the low-level
+* flushing code keep track of what needs flushing.
+*
+* This is not used on Xen PV.
+*/
+   atomic64_t tlb_gen;
+
 #ifdef CONFIG_MODIFY_LDT_SYSCALL
struct ldt_struct *ldt;
 #endif
@@ -37,6 +53,11 @@ typedef struct {
 #endif
 } mm_context_t;
 
+#define INIT_MM_CONTEXT(mm)\
+   .context = {\
+   .ctx_id = 1,\
+   }
+
 void leave_mm(int cpu);
 
 #endif /* _ASM_X86_MMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index ecfcb6643c9b..e5295d485899 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -129,9 +129,14 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
 }
 
+extern atomic64_t last_mm_ctx_id;
+
 static inline int init_new_context(struct task_struct *tsk,
   struct mm_struct *mm)
 {
+   mm->context.ctx_id = atomic64_inc_return(_mm_ctx_id);
+   atomic64_set(>context.tlb_gen, 0);
+
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
/* pkey 0 is the default and always allocated */
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 50ea3482e1d1..1eb946c0507e 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -57,6 +57,23 @@ static inline void invpcid_flush_all_nonglobals(void)
__invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
 }
 
+static inline u64 bump_mm_tlb_gen(struct mm_struct *mm)
+{
+   u64 new_tlb_gen;
+
+   /*
+* Bump the generation count.  This also serves as a full barrier
+* that synchronizes with switch_mm: callers are required to order
+* their read of mm_cpumask after their writes to the paging
+* structures.
+*/
+   smp_mb__before_atomic();
+   new_tlb_gen = atomic64_inc_return(>context.tlb_gen);
+   smp_mb__after_atomic();
+
+   return new_tlb_gen;
+}
+
 #ifdef CONFIG_PARAVIRT
 #include 
 #else
@@ -262,6 +279,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
+   bump_mm_tlb_gen(mm);
cpumask_or(>cpumask, >cpumask, mm_cpumask(mm));
 }
 
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index b2485d69f7c2..7c99c50e8bc9 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -28,6 +28,8 @@
  * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi
  */
 
+atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1);
+
 void leave_mm(int cpu)
 {
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
@@ -283,8 +285,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long 
start,
 
cpu = get_cpu();
 
-   /* Synchronize with switch_mm. */
-   smp_mb();
+   /* This is also a barrier that synchronizes with switch_mm(). */
+   bump_mm_tlb_gen(mm);
 
/* Should we flush just

[PATCH v2 04/10] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm

2017-06-13 Thread Andy Lutomirski

There are two kernel features that would benefit from tracking
how up-to-date each CPU's TLB is in the case where IPIs aren't keeping
it up to date in real time:

 - Lazy mm switching currently works by switching to init_mm when
   it would otherwise flush.  This is wasteful: there isn't fundamentally
   any need to update CR3 at all when going lazy or when returning from
   lazy mode, nor is there any need to receive flush IPIs at all.  Instead,
   we should just stop trying to keep the TLB coherent when we go lazy and,
   when unlazying, check whether we missed any flushes.

 - PCID will let us keep recent user contexts alive in the TLB.  If we
   start doing this, we need a way to decide whether those contexts are
   up to date.

On some paravirt systems, remote TLBs can be flushed without IPIs.
This won't update the target CPUs' tlb_gens, which may cause
unnecessary local flushes later on.  We can address this if it becomes
a problem by carefully updating the target CPU's tlb_gen directly.

By itself, this patch is a very minor optimization that avoids
unnecessary flushes when multiple TLB flushes targetting the same CPU
race.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/tlbflush.h | 37 +++
 arch/x86/mm/tlb.c   | 79 +
 2 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 1eb946c0507e..4f6c30d6ec39 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -82,6 +82,11 @@ static inline u64 bump_mm_tlb_gen(struct mm_struct *mm)
 #define __flush_tlb_single(addr) __native_flush_tlb_single(addr)
 #endif
 
+struct tlb_context {
+   u64 ctx_id;
+   u64 tlb_gen;
+};
+
 struct tlb_state {
/*
 * cpu_tlbstate.loaded_mm should match CR3 whenever interrupts
@@ -97,6 +102,21 @@ struct tlb_state {
 * disabling interrupts when modifying either one.
 */
unsigned long cr4;
+
+   /*
+* This is a list of all contexts that might exist in the TLB.
+* Since we don't yet use PCID, there is only one context.
+*
+* For each context, ctx_id indicates which mm the TLB's user
+* entries came from.  As an invariant, the TLB will never
+* contain entries that are out-of-date as when that mm reached
+* the tlb_gen in the list.
+*
+* To be clear, this means that it's legal for the TLB code to
+* flush the TLB without updating tlb_gen.  This can happen
+* (for now, at least) due to paravirt remote flushes.
+*/
+   struct tlb_context ctxs[1];
 };
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate);
 
@@ -248,9 +268,26 @@ static inline void __flush_tlb_one(unsigned long addr)
  * and page-granular flushes are available only on i486 and up.
  */
 struct flush_tlb_info {
+   /*
+* We support several kinds of flushes.
+*
+* - Fully flush a single mm.  flush_mm will be set, flush_end will be
+*   TLB_FLUSH_ALL, and new_tlb_gen will be the tlb_gen to which the
+*   IPI sender is trying to catch us up.
+*
+* - Partially flush a single mm.  flush_mm will be set, flush_start
+*   and flush_end will indicate the range, and new_tlb_gen will be
+*   set such that the changes between generation new_tlb_gen-1 and
+*   new_tlb_gen are entirely contained in the indicated range.
+*
+* - Fully flush all mms whose tlb_gens have been updated.  flush_mm
+*   will be NULL, flush_end will be TLB_FLUSH_ALL, and new_tlb_gen
+*   will be zero.
+*/
struct mm_struct *mm;
unsigned long start;
unsigned long end;
+   u64 new_tlb_gen;
 };
 
 #define local_flush_tlb() __flush_tlb()
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7c99c50e8bc9..3b19ba748e92 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -105,6 +105,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
}
 
this_cpu_write(cpu_tlbstate.loaded_mm, next);
+   this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, next->context.ctx_id);
+   this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen,
+  atomic64_read(>context.tlb_gen));
 
WARN_ON_ONCE(cpumask_test_cpu(cpu, mm_cpumask(next)));
cpumask_set_cpu(cpu, mm_cpumask(next));
@@ -194,17 +197,70 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 static void flush_tlb_func_common(const struct flush_tlb_info *f,
  bool local, enum tlb_flush_reason reason)
 {
+   struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+
+   /*
+* Our memory ordering requirement is that any TLB fills that
+* happen after we flush the TLB are ordered after we read
+* active_mm's

[PATCH v2 05/10] x86/mm: Rework lazy TLB mode and TLB freshness tracking

2017-06-13 Thread Andy Lutomirski

x86's lazy TLB mode used to be fairly weak -- it would switch to
init_mm the first time it tried to flush a lazy TLB.  This meant an
unnecessary CR3 write and, if the flush was remote, an unnecessary
IPI.

Rewrite it entirely.  When we enter lazy mode, we simply remove the
cpu from mm_cpumask.  This means that we need a way to figure out
whether we've missed a flush when we switch back out of lazy mode.
I use the tlb_gen machinery to track whether a context is up to
date.

Note to reviewers: this patch, my itself, looks a bit odd.  I'm
using an array of length 1 containing (ctx_id, tlb_gen) rather than
just storing tlb_gen, and making it at array isn't necessary yet.
I'm doing this because the next few patches add PCID support, and,
with PCID, we need ctx_id, and the array will end up with a length
greater than 1.  Making it an array now means that there will be
less churn and therefore less stress on your eyeballs.

NB: This is dubious but, AFAICT, still correct on Xen and UV.
xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
patch changes the way that mm_cpumask() works.  This should be okay,
since Xen *also* iterates all online CPUs to find all the CPUs it
needs to twiddle.

The UV tlbflush code is rather dated and should be changed.

Cc: Andrew Banman 
Cc: Mike Travis 
Cc: Dimitri Sivanich 
Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu_context.h |   6 +-
 arch/x86/include/asm/tlbflush.h|   4 -
 arch/x86/mm/init.c |   1 -
 arch/x86/mm/tlb.c  | 242 +++--
 4 files changed, 131 insertions(+), 122 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index e5295d485899..69a4f1ee86ac 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -125,8 +125,10 @@ static inline void switch_ldt(struct mm_struct *prev, 
struct mm_struct *next)
 
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct 
*tsk)
 {
-   if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
-   this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
+   int cpu = smp_processor_id();
+
+   if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
+   cpumask_clear_cpu(cpu, mm_cpumask(mm));
 }
 
 extern atomic64_t last_mm_ctx_id;
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 4f6c30d6ec39..87b13e51e867 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -95,7 +95,6 @@ struct tlb_state {
 * mode even if we've already switched back to swapper_pg_dir.
 */
struct mm_struct *loaded_mm;
-   int state;
 
/*
 * Access to this CR4 shadow and to H/W CR4 is protected by
@@ -310,9 +309,6 @@ static inline void flush_tlb_page(struct vm_area_struct 
*vma, unsigned long a)
 void native_flush_tlb_others(const struct cpumask *cpumask,
 const struct flush_tlb_info *info);
 
-#define TLBSTATE_OK1
-#define TLBSTATE_LAZY  2
-
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 88ee942cb47d..7d6fa4676af9 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -812,7 +812,6 @@ void __init zone_sizes_init(void)
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
.loaded_mm = _mm,
-   .state = 0,
.cr4 = ~0UL,/* fail hard if we screw up cr4 shadow initialization */
 };
 EXPORT_SYMBOL_GPL(cpu_tlbstate);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3b19ba748e92..fea2b07ac7d8 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -45,8 +45,8 @@ void leave_mm(int cpu)
if (loaded_mm == _mm)
return;
 
-   if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
-   BUG();
+   /* Warn if we're not lazy. */
+   WARN_ON(cpumask_test_cpu(smp_processor_id(), mm_cpumask(loaded_mm)));
 
switch_mm(NULL, _mm, NULL);
 }
@@ -67,133 +67,118 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 {
unsigned cpu = smp_processor_id();
struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
+   u64 next_tlb_gen;
 
/*
-* NB: The scheduler will call us with prev == next when
-* switching from lazy TLB mode to normal mode if active_mm
-* isn't changing.  When this happens, there is no guarantee
-* that CR3 (and hence cpu_tlbstate.loaded_mm) matches next.
+* NB: The scheduler will call us with prev == next when switching
+* from lazy TLB mode to normal mode if active_mm isn't changing.
+* When this happens,

[PATCH v2 02/10] x86/mm: Remove reset_lazy_tlbstate()

2017-06-13 Thread Andy Lutomirski

The only call site also calls idle_task_exit(), and idle_task_exit()
puts us into a clean state by explicitly switching to init_mm.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/tlbflush.h | 8 
 arch/x86/kernel/smpboot.c   | 1 -
 2 files changed, 9 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 5f78c6a77578..50ea3482e1d1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -259,14 +259,6 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 #define TLBSTATE_OK1
 #define TLBSTATE_LAZY  2
 
-static inline void reset_lazy_tlbstate(void)
-{
-   this_cpu_write(cpu_tlbstate.state, 0);
-   this_cpu_write(cpu_tlbstate.loaded_mm, _mm);
-
-   WARN_ON(read_cr3_pa() != __pa_symbol(swapper_pg_dir));
-}
-
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f04479a8f74f..6169a56aab49 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1589,7 +1589,6 @@ void native_cpu_die(unsigned int cpu)
 void play_dead_common(void)
 {
idle_task_exit();
-   reset_lazy_tlbstate();
 
/* Ack it */
(void)cpu_report_death();
-- 
2.9.4

[PATCH v2 07/10] x86/mm: Disable PCID on 32-bit kernels

2017-06-13 Thread Andy Lutomirski

32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode.  Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/disabled-features.h | 4 +++-
 arch/x86/kernel/cpu/bugs.c   | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 5dff775af7cd..c10c9128f54e 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -21,11 +21,13 @@
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
 # define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
 # define DISABLE_CENTAUR_MCR   (1<<(X86_FEATURE_CENTAUR_MCR & 31))
+# define DISABLE_PCID  0
 #else
 # define DISABLE_VME   0
 # define DISABLE_K6_MTRR   0
 # define DISABLE_CYRIX_ARR 0
 # define DISABLE_CENTAUR_MCR   0
+# define DISABLE_PCID  (1<<(X86_FEATURE_PCID & 31))
 #endif /* CONFIG_X86_64 */
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@@ -49,7 +51,7 @@
 #define DISABLED_MASK1 0
 #define DISABLED_MASK2 0
 #define DISABLED_MASK3 (DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR)
-#define DISABLED_MASK4 0
+#define DISABLED_MASK4 (DISABLE_PCID)
 #define DISABLED_MASK5 0
 #define DISABLED_MASK6 0
 #define DISABLED_MASK7 0
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0af86d9242da..db684880d74a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -21,6 +21,14 @@
 
 void __init check_bugs(void)
 {
+#ifdef CONFIG_X86_32
+   /*
+* Regardless of whether PCID is enumerated, the SDM says
+* that it can't be enabled in 32-bit mode.
+*/
+   setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
+
identify_boot_cpu();
 
if (!IS_ENABLED(CONFIG_SMP)) {
-- 
2.9.4

[PATCH v2 04/10] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm

2017-06-13 Thread Andy Lutomirski

There are two kernel features that would benefit from tracking
how up-to-date each CPU's TLB is in the case where IPIs aren't keeping
it up to date in real time:

 - Lazy mm switching currently works by switching to init_mm when
   it would otherwise flush.  This is wasteful: there isn't fundamentally
   any need to update CR3 at all when going lazy or when returning from
   lazy mode, nor is there any need to receive flush IPIs at all.  Instead,
   we should just stop trying to keep the TLB coherent when we go lazy and,
   when unlazying, check whether we missed any flushes.

 - PCID will let us keep recent user contexts alive in the TLB.  If we
   start doing this, we need a way to decide whether those contexts are
   up to date.

On some paravirt systems, remote TLBs can be flushed without IPIs.
This won't update the target CPUs' tlb_gens, which may cause
unnecessary local flushes later on.  We can address this if it becomes
a problem by carefully updating the target CPU's tlb_gen directly.

By itself, this patch is a very minor optimization that avoids
unnecessary flushes when multiple TLB flushes targetting the same CPU
race.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/tlbflush.h | 37 +++
 arch/x86/mm/tlb.c   | 79 +
 2 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 1eb946c0507e..4f6c30d6ec39 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -82,6 +82,11 @@ static inline u64 bump_mm_tlb_gen(struct mm_struct *mm)
 #define __flush_tlb_single(addr) __native_flush_tlb_single(addr)
 #endif
 
+struct tlb_context {
+   u64 ctx_id;
+   u64 tlb_gen;
+};
+
 struct tlb_state {
/*
 * cpu_tlbstate.loaded_mm should match CR3 whenever interrupts
@@ -97,6 +102,21 @@ struct tlb_state {
 * disabling interrupts when modifying either one.
 */
unsigned long cr4;
+
+   /*
+* This is a list of all contexts that might exist in the TLB.
+* Since we don't yet use PCID, there is only one context.
+*
+* For each context, ctx_id indicates which mm the TLB's user
+* entries came from.  As an invariant, the TLB will never
+* contain entries that are out-of-date as when that mm reached
+* the tlb_gen in the list.
+*
+* To be clear, this means that it's legal for the TLB code to
+* flush the TLB without updating tlb_gen.  This can happen
+* (for now, at least) due to paravirt remote flushes.
+*/
+   struct tlb_context ctxs[1];
 };
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate);
 
@@ -248,9 +268,26 @@ static inline void __flush_tlb_one(unsigned long addr)
  * and page-granular flushes are available only on i486 and up.
  */
 struct flush_tlb_info {
+   /*
+* We support several kinds of flushes.
+*
+* - Fully flush a single mm.  flush_mm will be set, flush_end will be
+*   TLB_FLUSH_ALL, and new_tlb_gen will be the tlb_gen to which the
+*   IPI sender is trying to catch us up.
+*
+* - Partially flush a single mm.  flush_mm will be set, flush_start
+*   and flush_end will indicate the range, and new_tlb_gen will be
+*   set such that the changes between generation new_tlb_gen-1 and
+*   new_tlb_gen are entirely contained in the indicated range.
+*
+* - Fully flush all mms whose tlb_gens have been updated.  flush_mm
+*   will be NULL, flush_end will be TLB_FLUSH_ALL, and new_tlb_gen
+*   will be zero.
+*/
struct mm_struct *mm;
unsigned long start;
unsigned long end;
+   u64 new_tlb_gen;
 };
 
 #define local_flush_tlb() __flush_tlb()
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7c99c50e8bc9..3b19ba748e92 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -105,6 +105,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
}
 
this_cpu_write(cpu_tlbstate.loaded_mm, next);
+   this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, next->context.ctx_id);
+   this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen,
+  atomic64_read(>context.tlb_gen));
 
WARN_ON_ONCE(cpumask_test_cpu(cpu, mm_cpumask(next)));
cpumask_set_cpu(cpu, mm_cpumask(next));
@@ -194,17 +197,70 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 static void flush_tlb_func_common(const struct flush_tlb_info *f,
  bool local, enum tlb_flush_reason reason)
 {
+   struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+
+   /*
+* Our memory ordering requirement is that any TLB fills that
+* happen after we flush the TLB are ordered after we read
+* active_mm's tlb_gen.  We don't

[PATCH v2 05/10] x86/mm: Rework lazy TLB mode and TLB freshness tracking

2017-06-13 Thread Andy Lutomirski

x86's lazy TLB mode used to be fairly weak -- it would switch to
init_mm the first time it tried to flush a lazy TLB.  This meant an
unnecessary CR3 write and, if the flush was remote, an unnecessary
IPI.

Rewrite it entirely.  When we enter lazy mode, we simply remove the
cpu from mm_cpumask.  This means that we need a way to figure out
whether we've missed a flush when we switch back out of lazy mode.
I use the tlb_gen machinery to track whether a context is up to
date.

Note to reviewers: this patch, my itself, looks a bit odd.  I'm
using an array of length 1 containing (ctx_id, tlb_gen) rather than
just storing tlb_gen, and making it at array isn't necessary yet.
I'm doing this because the next few patches add PCID support, and,
with PCID, we need ctx_id, and the array will end up with a length
greater than 1.  Making it an array now means that there will be
less churn and therefore less stress on your eyeballs.

NB: This is dubious but, AFAICT, still correct on Xen and UV.
xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
patch changes the way that mm_cpumask() works.  This should be okay,
since Xen *also* iterates all online CPUs to find all the CPUs it
needs to twiddle.

The UV tlbflush code is rather dated and should be changed.

Cc: Andrew Banman 
Cc: Mike Travis 
Cc: Dimitri Sivanich 
Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu_context.h |   6 +-
 arch/x86/include/asm/tlbflush.h|   4 -
 arch/x86/mm/init.c |   1 -
 arch/x86/mm/tlb.c  | 242 +++--
 4 files changed, 131 insertions(+), 122 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index e5295d485899..69a4f1ee86ac 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -125,8 +125,10 @@ static inline void switch_ldt(struct mm_struct *prev, 
struct mm_struct *next)
 
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct 
*tsk)
 {
-   if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
-   this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
+   int cpu = smp_processor_id();
+
+   if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
+   cpumask_clear_cpu(cpu, mm_cpumask(mm));
 }
 
 extern atomic64_t last_mm_ctx_id;
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 4f6c30d6ec39..87b13e51e867 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -95,7 +95,6 @@ struct tlb_state {
 * mode even if we've already switched back to swapper_pg_dir.
 */
struct mm_struct *loaded_mm;
-   int state;
 
/*
 * Access to this CR4 shadow and to H/W CR4 is protected by
@@ -310,9 +309,6 @@ static inline void flush_tlb_page(struct vm_area_struct 
*vma, unsigned long a)
 void native_flush_tlb_others(const struct cpumask *cpumask,
 const struct flush_tlb_info *info);
 
-#define TLBSTATE_OK1
-#define TLBSTATE_LAZY  2
-
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 88ee942cb47d..7d6fa4676af9 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -812,7 +812,6 @@ void __init zone_sizes_init(void)
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
.loaded_mm = _mm,
-   .state = 0,
.cr4 = ~0UL,/* fail hard if we screw up cr4 shadow initialization */
 };
 EXPORT_SYMBOL_GPL(cpu_tlbstate);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3b19ba748e92..fea2b07ac7d8 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -45,8 +45,8 @@ void leave_mm(int cpu)
if (loaded_mm == _mm)
return;
 
-   if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
-   BUG();
+   /* Warn if we're not lazy. */
+   WARN_ON(cpumask_test_cpu(smp_processor_id(), mm_cpumask(loaded_mm)));
 
switch_mm(NULL, _mm, NULL);
 }
@@ -67,133 +67,118 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 {
unsigned cpu = smp_processor_id();
struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
+   u64 next_tlb_gen;
 
/*
-* NB: The scheduler will call us with prev == next when
-* switching from lazy TLB mode to normal mode if active_mm
-* isn't changing.  When this happens, there is no guarantee
-* that CR3 (and hence cpu_tlbstate.loaded_mm) matches next.
+* NB: The scheduler will call us with prev == next when switching
+* from lazy TLB mode to normal mode if active_mm isn't changing.
+* When this happens, we don't assume that CR3 (and hence
+* cpu_tlbstate.loaded_mm) matches next.
 *
 * NB:

[PATCH v2 02/10] x86/mm: Remove reset_lazy_tlbstate()

2017-06-13 Thread Andy Lutomirski

The only call site also calls idle_task_exit(), and idle_task_exit()
puts us into a clean state by explicitly switching to init_mm.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/tlbflush.h | 8 
 arch/x86/kernel/smpboot.c   | 1 -
 2 files changed, 9 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 5f78c6a77578..50ea3482e1d1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -259,14 +259,6 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 #define TLBSTATE_OK1
 #define TLBSTATE_LAZY  2
 
-static inline void reset_lazy_tlbstate(void)
-{
-   this_cpu_write(cpu_tlbstate.state, 0);
-   this_cpu_write(cpu_tlbstate.loaded_mm, _mm);
-
-   WARN_ON(read_cr3_pa() != __pa_symbol(swapper_pg_dir));
-}
-
 static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch 
*batch,
struct mm_struct *mm)
 {
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f04479a8f74f..6169a56aab49 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1589,7 +1589,6 @@ void native_cpu_die(unsigned int cpu)
 void play_dead_common(void)
 {
idle_task_exit();
-   reset_lazy_tlbstate();
 
/* Ack it */
(void)cpu_report_death();
-- 
2.9.4

[PATCH v2 07/10] x86/mm: Disable PCID on 32-bit kernels

2017-06-13 Thread Andy Lutomirski

32-bit kernels on new hardware will see PCID in CPUID, but PCID can
only be used in 64-bit mode.  Rather than making all PCID code
conditional, just disable the feature on 32-bit builds.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/disabled-features.h | 4 +++-
 arch/x86/kernel/cpu/bugs.c   | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 5dff775af7cd..c10c9128f54e 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -21,11 +21,13 @@
 # define DISABLE_K6_MTRR   (1<<(X86_FEATURE_K6_MTRR & 31))
 # define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
 # define DISABLE_CENTAUR_MCR   (1<<(X86_FEATURE_CENTAUR_MCR & 31))
+# define DISABLE_PCID  0
 #else
 # define DISABLE_VME   0
 # define DISABLE_K6_MTRR   0
 # define DISABLE_CYRIX_ARR 0
 # define DISABLE_CENTAUR_MCR   0
+# define DISABLE_PCID  (1<<(X86_FEATURE_PCID & 31))
 #endif /* CONFIG_X86_64 */
 
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@@ -49,7 +51,7 @@
 #define DISABLED_MASK1 0
 #define DISABLED_MASK2 0
 #define DISABLED_MASK3 (DISABLE_CYRIX_ARR|DISABLE_CENTAUR_MCR|DISABLE_K6_MTRR)
-#define DISABLED_MASK4 0
+#define DISABLED_MASK4 (DISABLE_PCID)
 #define DISABLED_MASK5 0
 #define DISABLED_MASK6 0
 #define DISABLED_MASK7 0
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0af86d9242da..db684880d74a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -21,6 +21,14 @@
 
 void __init check_bugs(void)
 {
+#ifdef CONFIG_X86_32
+   /*
+* Regardless of whether PCID is enumerated, the SDM says
+* that it can't be enabled in 32-bit mode.
+*/
+   setup_clear_cpu_cap(X86_FEATURE_PCID);
+#endif
+
identify_boot_cpu();
 
if (!IS_ENABLED(CONFIG_SMP)) {
-- 
2.9.4

Re: [PATCH] pci: iov: use device lock to protect IOV sysfs accesses

2017-06-13 Thread Jakub Kicinski

On Tue, 30 May 2017 16:34:29 -0700, Jakub Kicinski wrote:
> On Tue, 30 May 2017 18:07:18 -0500, Bjorn Helgaas wrote:
> > On Fri, May 26, 2017 at 04:58:20PM -0700, Jakub Kicinski wrote:  
> > > On Fri, 26 May 2017 18:47:26 -0500, Bjorn Helgaas wrote:
> > > > On Mon, May 22, 2017 at 03:50:23PM -0700, Jakub Kicinski wrote:
> > > > > PCI core sets the driver pointer before calling ->probe() and only
> > > > > clears it after ->remove().  This means driver's ->sriov_configure()
> > > > > callback will happily race with probe() and remove(), most likely
> > > > > leading to BUGs, since drivers don't expect this.  
> > > > 
> > > > I guess you're referring to the pci_dev->driver pointer set by
> > > > local_pci_probe(), and this is important because sriov_numvfs_store()
> > > > checks that pointer, right?
> > > 
> > > Yes, exactly.  I initially thought this is how the safety of sriov
> > > callback may have been ensured, but since the order of
> > > local_pci_probe() and the assignment is what it is, it can't.
> > 
> > Right.  I was hoping other subsystems would establish a convention
> > about whether we set the ->driver pointer before or after calling the
> > driver probe() method, but if there is one, I don't see it.
> > local_pci_probe() and really_probe() set ->driver first, but
> > pnp_device_probe() calls the probe() method first.  
> 
> I didn't dig into reordering the pointer setting, to be honest.  I
> thought establishing that driver callbacks should generally hold device
> lock, whenever possible, would be even better than pointer setting
> conventions.
> 
> If we order the assignments better, wouldn't we still need appropriate
> memory barriers to rely on the order? (:
> 
> > Can you expand on how you reproduce this problem?  The only real way I
> > see to call ->sriov_configure() is via the sysfs entry point, and I
> > would think user-space code would typically not touch that until after
> > it knows the driver has claimed a device.  But I can certainly imagine
> > targeted test code that could hit this problem.  
> 
> Correct.  It's not something that users should be triggering often in
> normal use.  I also found it by code inspection rather than by getting
> an oops.
> 
> OTOH if the driver performs FW load or other time-consuming operations
> in ->probe() the time window when this can be triggered may be counted
> in seconds.

Hi Bjorn, 

is this patch still considered for 4.13, or should I change it somehow?

Re: [PATCH] pci: iov: use device lock to protect IOV sysfs accesses

2017-06-13 Thread Jakub Kicinski

On Tue, 30 May 2017 16:34:29 -0700, Jakub Kicinski wrote:
> On Tue, 30 May 2017 18:07:18 -0500, Bjorn Helgaas wrote:
> > On Fri, May 26, 2017 at 04:58:20PM -0700, Jakub Kicinski wrote:  
> > > On Fri, 26 May 2017 18:47:26 -0500, Bjorn Helgaas wrote:
> > > > On Mon, May 22, 2017 at 03:50:23PM -0700, Jakub Kicinski wrote:
> > > > > PCI core sets the driver pointer before calling ->probe() and only
> > > > > clears it after ->remove().  This means driver's ->sriov_configure()
> > > > > callback will happily race with probe() and remove(), most likely
> > > > > leading to BUGs, since drivers don't expect this.  
> > > > 
> > > > I guess you're referring to the pci_dev->driver pointer set by
> > > > local_pci_probe(), and this is important because sriov_numvfs_store()
> > > > checks that pointer, right?
> > > 
> > > Yes, exactly.  I initially thought this is how the safety of sriov
> > > callback may have been ensured, but since the order of
> > > local_pci_probe() and the assignment is what it is, it can't.
> > 
> > Right.  I was hoping other subsystems would establish a convention
> > about whether we set the ->driver pointer before or after calling the
> > driver probe() method, but if there is one, I don't see it.
> > local_pci_probe() and really_probe() set ->driver first, but
> > pnp_device_probe() calls the probe() method first.  
> 
> I didn't dig into reordering the pointer setting, to be honest.  I
> thought establishing that driver callbacks should generally hold device
> lock, whenever possible, would be even better than pointer setting
> conventions.
> 
> If we order the assignments better, wouldn't we still need appropriate
> memory barriers to rely on the order? (:
> 
> > Can you expand on how you reproduce this problem?  The only real way I
> > see to call ->sriov_configure() is via the sysfs entry point, and I
> > would think user-space code would typically not touch that until after
> > it knows the driver has claimed a device.  But I can certainly imagine
> > targeted test code that could hit this problem.  
> 
> Correct.  It's not something that users should be triggering often in
> normal use.  I also found it by code inspection rather than by getting
> an oops.
> 
> OTOH if the driver performs FW load or other time-consuming operations
> in ->probe() the time window when this can be triggered may be counted
> in seconds.

Hi Bjorn, 

is this patch still considered for 4.13, or should I change it somehow?

[PATCH v2 06/10] x86/mm: Stop calling leave_mm() in idle code

2017-06-13 Thread Andy Lutomirski

Now that lazy TLB suppresses all flush IPIs (as opposed to all but
the first), there's no need to leave_mm() when going idle.

This means we can get rid of the rcuidle hack in
switch_mm_irqs_off() and we can unexport leave_mm().

This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
since it has no callers any more.

Signed-off-by: Andy Lutomirski 
---
 arch/ia64/include/asm/acpi.h  |  2 --
 arch/x86/include/asm/acpi.h   |  2 --
 arch/x86/mm/tlb.c | 19 +++
 drivers/acpi/processor_idle.c |  2 --
 drivers/idle/intel_idle.c |  9 -
 5 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/include/asm/acpi.h b/arch/ia64/include/asm/acpi.h
index a3d0211970e9..c86a947f5368 100644
--- a/arch/ia64/include/asm/acpi.h
+++ b/arch/ia64/include/asm/acpi.h
@@ -112,8 +112,6 @@ static inline void arch_acpi_set_pdc_bits(u32 *buf)
buf[2] |= ACPI_PDC_EST_CAPABILITY_SMP;
 }
 
-#define acpi_unlazy_tlb(x)
-
 #ifdef CONFIG_ACPI_NUMA
 extern cpumask_t early_cpu_possible_map;
 #define for_each_possible_early_cpu(cpu)  \
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 2efc768e4362..562286fa151f 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -150,8 +150,6 @@ static inline void disable_acpi(void) { }
 extern int x86_acpi_numa_init(void);
 #endif /* CONFIG_ACPI_NUMA */
 
-#define acpi_unlazy_tlb(x) leave_mm(x)
-
 #ifdef CONFIG_ACPI_APEI
 static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
 {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index fea2b07ac7d8..5f932fd80881 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -50,7 +50,6 @@ void leave_mm(int cpu)
 
switch_mm(NULL, _mm, NULL);
 }
-EXPORT_SYMBOL_GPL(leave_mm);
 
 void switch_mm(struct mm_struct *prev, struct mm_struct *next,
   struct task_struct *tsk)
@@ -113,14 +112,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen,
   next_tlb_gen);
write_cr3(__pa(next->pgd));
-   /*
-* This gets called via leave_mm() in the idle path
-* where RCU functions differently.  Tracing normally
-* uses RCU, so we have to call the tracepoint
-* specially here.
-*/
-   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH,
-   TLB_FLUSH_ALL);
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
+   TLB_FLUSH_ALL);
}
 
/*
@@ -166,13 +159,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
this_cpu_write(cpu_tlbstate.loaded_mm, next);
write_cr3(__pa(next->pgd));
 
-   /*
-* This gets called via leave_mm() in the idle path where RCU
-* functions differently.  Tracing normally uses RCU, so we
-* have to call the tracepoint specially here.
-*/
-   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH,
-   TLB_FLUSH_ALL);
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
}
 
load_mm_cr4(next);
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 5c8aa9cf62d7..fe3d2a40f311 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -708,8 +708,6 @@ static DEFINE_RAW_SPINLOCK(c3_lock);
 static void acpi_idle_enter_bm(struct acpi_processor *pr,
   struct acpi_processor_cx *cx, bool timer_bc)
 {
-   acpi_unlazy_tlb(smp_processor_id());
-
/*
 * Must be done before busmaster disable as we might need to
 * access HPET !
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 216d7ec88c0c..2ae43f59091d 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -912,16 +912,15 @@ static __cpuidle int intel_idle(struct cpuidle_device 
*dev,
struct cpuidle_state *state = >states[index];
unsigned long eax = flg2MWAIT(state->flags);
unsigned int cstate;
-   int cpu = smp_processor_id();
 
cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
 
/*
-* leave_mm() to avoid costly and often unnecessary wakeups
-* for flushing the user TLB's associated with the active mm.
+* NB: if CPUIDLE_FLAG_TLB_FLUSHED is set, this idle transition
+* will probably flush the TLB.  It's not guaranteed to flush
+* the TLB, though, so it's not clear that we can do anything
+* useful with this knowledge.
 */
-   if (state->flags &

[PATCH v2 10/10] x86/mm: Try to preserve old TLB entries using PCID

2017-06-13 Thread Andy Lutomirski

PCID is a "process context ID" -- it's what other architectures call
an address space ID.  Every non-global TLB entry is tagged with a
PCID, only TLB entries that match the currently selected PCID are
used, and we can switch PGDs without flushing the TLB.  x86's
PCID is 12 bits.

This is an unorthodox approach to using PCID.  x86's PCID is far too
short to uniquely identify a process, and we can't even really
uniquely identify a running process because there are monster
systems with over 4096 CPUs.  To make matters worse, past attempts
to use all 12 PCID bits have resulted in slowdowns instead of
speedups.

This patch uses PCID differently.  We use a PCID to identify a
recently-used mm on a per-cpu basis.  An mm has no fixed PCID
binding at all; instead, we give it a fresh PCID each time it's
loaded except in cases where we want to preserve the TLB, in which
case we reuse a recent value.

In particular, we use PCIDs 1-3 for recently-used mms and we reserve
PCID 0 for swapper_pg_dir and for PCID-unaware CR3 users (e.g. EFI).
Nothing ever switches to PCID 0 without flushing PCID 0 non-global
pages, so PCID 0 conflicts won't cause problems.

This seems to save about 100ns on context switches between mms.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/mmu_context.h |  3 ++
 arch/x86/include/asm/processor-flags.h |  2 +
 arch/x86/include/asm/tlbflush.h| 18 +++-
 arch/x86/mm/init.c |  1 +
 arch/x86/mm/tlb.c  | 82 ++
 5 files changed, 86 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 69a4f1ee86ac..2537ec03c9b7 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -299,6 +299,9 @@ static inline unsigned long __get_current_cr3_fast(void)
 {
unsigned long cr3 = __pa(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd);
 
+   if (static_cpu_has(X86_FEATURE_PCID))
+   cr3 |= this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+
/* For now, be very restrictive about when this can be called. */
VM_WARN_ON(in_nmi() || !in_atomic());
 
diff --git a/arch/x86/include/asm/processor-flags.h 
b/arch/x86/include/asm/processor-flags.h
index 79aa2f98398d..791b60199aa4 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -35,6 +35,7 @@
 /* Mask off the address space ID bits. */
 #define CR3_ADDR_MASK 0x7000ull
 #define CR3_PCID_MASK 0xFFFull
+#define CR3_NOFLUSH (1UL << 63)
 #else
 /*
  * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
@@ -42,6 +43,7 @@
  */
 #define CR3_ADDR_MASK 0xull
 #define CR3_PCID_MASK 0ull
+#define CR3_NOFLUSH 0
 #endif
 
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 57b305e13c4c..a9a5aa6f45f7 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -82,6 +82,12 @@ static inline u64 bump_mm_tlb_gen(struct mm_struct *mm)
 #define __flush_tlb_single(addr) __native_flush_tlb_single(addr)
 #endif
 
+/*
+ * 6 because 6 should be plenty and struct tlb_state will fit in
+ * two cache lines.
+ */
+#define NR_DYNAMIC_ASIDS 6
+
 struct tlb_context {
u64 ctx_id;
u64 tlb_gen;
@@ -95,6 +101,8 @@ struct tlb_state {
 * mode even if we've already switched back to swapper_pg_dir.
 */
struct mm_struct *loaded_mm;
+   u16 loaded_mm_asid;
+   u16 next_asid;
 
/*
 * Access to this CR4 shadow and to H/W CR4 is protected by
@@ -104,7 +112,8 @@ struct tlb_state {
 
/*
 * This is a list of all contexts that might exist in the TLB.
-* Since we don't yet use PCID, there is only one context.
+* There is one per ASID that we use, and the ASID (what the
+* CPU calls PCID) is the index into ctxts.
 *
 * For each context, ctx_id indicates which mm the TLB's user
 * entries came from.  As an invariant, the TLB will never
@@ -114,8 +123,13 @@ struct tlb_state {
 * To be clear, this means that it's legal for the TLB code to
 * flush the TLB without updating tlb_gen.  This can happen
 * (for now, at least) due to paravirt remote flushes.
+*
+* NB: context 0 is a bit special, since it's also used by
+* various bits of init code.  This is fine -- code that
+* isn't aware of PCID will end up harmlessly flushing
+* context 0.
 */
-   struct tlb_context ctxs[1];
+   struct tlb_context ctxs[NR_DYNAMIC_ASIDS];
 };
 DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate);
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 7d6fa4676af9..9c9570d300ba 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -812,6 +812,7 @@ void __init zone_sizes_init(void)

[PATCH v2 08/10] x86/mm: Add nopcid to turn off PCID

2017-06-13 Thread Andy Lutomirski

The parameter is only present on x86_64 systems to save a few bytes,
as PCID is always disabled on x86_32.

Signed-off-by: Andy Lutomirski 
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/kernel/cpu/common.c| 18 ++
 2 files changed, 20 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 0f5c3b4347c6..aa385109ae58 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2648,6 +2648,8 @@
nopat   [X86] Disable PAT (page attribute table extension of
pagetables) support.
 
+   nopcid  [X86-64] Disable the PCID cpu feature.
+
norandmaps  Don't use address space randomization.  Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c8b39870f33e..904485e7b230 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -168,6 +168,24 @@ static int __init x86_mpx_setup(char *s)
 }
 __setup("nompx", x86_mpx_setup);
 
+#ifdef CONFIG_X86_64
+static int __init x86_pcid_setup(char *s)
+{
+   /* require an exact match without trailing characters */
+   if (strlen(s))
+   return 0;
+
+   /* do not emit a message if the feature is not present */
+   if (!boot_cpu_has(X86_FEATURE_PCID))
+   return 1;
+
+   setup_clear_cpu_cap(X86_FEATURE_PCID);
+   pr_info("nopcid: PCID feature disabled\n");
+   return 1;
+}
+__setup("nopcid", x86_pcid_setup);
+#endif
+
 static int __init x86_noinvpcid_setup(char *s)
 {
/* noinvpcid doesn't accept parameters */
-- 
2.9.4

[PATCH v2 06/10] x86/mm: Stop calling leave_mm() in idle code

2017-06-13 Thread Andy Lutomirski

Now that lazy TLB suppresses all flush IPIs (as opposed to all but
the first), there's no need to leave_mm() when going idle.

This means we can get rid of the rcuidle hack in
switch_mm_irqs_off() and we can unexport leave_mm().

This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
since it has no callers any more.

Signed-off-by: Andy Lutomirski 
---
 arch/ia64/include/asm/acpi.h  |  2 --
 arch/x86/include/asm/acpi.h   |  2 --
 arch/x86/mm/tlb.c | 19 +++
 drivers/acpi/processor_idle.c |  2 --
 drivers/idle/intel_idle.c |  9 -
 5 files changed, 7 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/include/asm/acpi.h b/arch/ia64/include/asm/acpi.h
index a3d0211970e9..c86a947f5368 100644
--- a/arch/ia64/include/asm/acpi.h
+++ b/arch/ia64/include/asm/acpi.h
@@ -112,8 +112,6 @@ static inline void arch_acpi_set_pdc_bits(u32 *buf)
buf[2] |= ACPI_PDC_EST_CAPABILITY_SMP;
 }
 
-#define acpi_unlazy_tlb(x)
-
 #ifdef CONFIG_ACPI_NUMA
 extern cpumask_t early_cpu_possible_map;
 #define for_each_possible_early_cpu(cpu)  \
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 2efc768e4362..562286fa151f 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -150,8 +150,6 @@ static inline void disable_acpi(void) { }
 extern int x86_acpi_numa_init(void);
 #endif /* CONFIG_ACPI_NUMA */
 
-#define acpi_unlazy_tlb(x) leave_mm(x)
-
 #ifdef CONFIG_ACPI_APEI
 static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr)
 {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index fea2b07ac7d8..5f932fd80881 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -50,7 +50,6 @@ void leave_mm(int cpu)
 
switch_mm(NULL, _mm, NULL);
 }
-EXPORT_SYMBOL_GPL(leave_mm);
 
 void switch_mm(struct mm_struct *prev, struct mm_struct *next,
   struct task_struct *tsk)
@@ -113,14 +112,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen,
   next_tlb_gen);
write_cr3(__pa(next->pgd));
-   /*
-* This gets called via leave_mm() in the idle path
-* where RCU functions differently.  Tracing normally
-* uses RCU, so we have to call the tracepoint
-* specially here.
-*/
-   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH,
-   TLB_FLUSH_ALL);
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
+   TLB_FLUSH_ALL);
}
 
/*
@@ -166,13 +159,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
this_cpu_write(cpu_tlbstate.loaded_mm, next);
write_cr3(__pa(next->pgd));
 
-   /*
-* This gets called via leave_mm() in the idle path where RCU
-* functions differently.  Tracing normally uses RCU, so we
-* have to call the tracepoint specially here.
-*/
-   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH,
-   TLB_FLUSH_ALL);
+   trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
}
 
load_mm_cr4(next);
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 5c8aa9cf62d7..fe3d2a40f311 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -708,8 +708,6 @@ static DEFINE_RAW_SPINLOCK(c3_lock);
 static void acpi_idle_enter_bm(struct acpi_processor *pr,
   struct acpi_processor_cx *cx, bool timer_bc)
 {
-   acpi_unlazy_tlb(smp_processor_id());
-
/*
 * Must be done before busmaster disable as we might need to
 * access HPET !
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 216d7ec88c0c..2ae43f59091d 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -912,16 +912,15 @@ static __cpuidle int intel_idle(struct cpuidle_device 
*dev,
struct cpuidle_state *state = >states[index];
unsigned long eax = flg2MWAIT(state->flags);
unsigned int cstate;
-   int cpu = smp_processor_id();
 
cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
 
/*
-* leave_mm() to avoid costly and often unnecessary wakeups
-* for flushing the user TLB's associated with the active mm.
+* NB: if CPUIDLE_FLAG_TLB_FLUSHED is set, this idle transition
+* will probably flush the TLB.  It's not guaranteed to flush
+* the TLB, though, so it's not clear that we can do anything
+* useful with this knowledge.
 */
-   if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
-

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2092 matches

Mail list logo