[PATCH 08/30] spi: dw: Discard DW SSI chip type storages
Keeping SPI peripheral devices type is pointless since first it hasn't been functionally utilized by any of the client drivers/code and second it won't work for Microwire type at the very least. Moreover there is no point in setting up the type by means of the chip-data in the modern kernel. The peripheral devices with specific interface type need to be detected in order to activate the corresponding frame format. It most likely will require some peripheral device specific DT property or whatever to find out the interface protocol. So let's remove the serial interface type fields from the DW APB SSI controller and the SPI peripheral device private data. Note we'll preserve the explicit SSI_MOTO_SPI interface type setting up to signify the only currently supported interface protocol. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 6 ++ drivers/spi/spi-dw.h | 1 - 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index be94ed5bb896..dd9574f9fafc 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -23,7 +23,6 @@ /* Slave spi_dev related */ struct chip_data { u8 tmode; /* TR/TO/RO/EEPROM */ - u8 type;/* SPI/SSP/MicroWire */ u16 clk_div;/* baud rate divider */ u32 speed_hz; /* baud rate */ @@ -238,7 +237,7 @@ u32 dw_spi_update_cr0(struct spi_controller *master, struct spi_device *spi, /* Default SPI mode is SCPOL = 0, SCPH = 0 */ cr0 = (transfer->bits_per_word - 1) - | (chip->type << SPI_FRF_OFFSET) + | (SSI_MOTO_SPI << SPI_FRF_OFFSET) | spi->mode & SPI_CPOL) ? 1 : 0) << SPI_SCOL_OFFSET) | (((spi->mode & SPI_CPHA) ? 1 : 0) << SPI_SCPH_OFFSET) | (((spi->mode & SPI_LOOP) ? 1 : 0) << SPI_SRL_OFFSET)) @@ -260,7 +259,7 @@ u32 dw_spi_update_cr0_v1_01a(struct spi_controller *master, cr0 = (transfer->bits_per_word - 1); /* CTRLR0[ 7: 6] Frame Format */ - cr0 |= chip->type << DWC_SSI_CTRLR0_FRF_OFFSET; + cr0 |= SSI_MOTO_SPI << DWC_SSI_CTRLR0_FRF_OFFSET; /* * SPI mode (SCPOL|SCPH) @@ -453,7 +452,6 @@ int dw_spi_add_host(struct device *dev, struct dw_spi *dws) return -ENOMEM; dws->master = master; - dws->type = SSI_MOTO_SPI; dws->dma_addr = (dma_addr_t)(dws->paddr + DW_SPI_DR); spi_controller_set_devdata(master, dws); diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h index 063fa1b1352d..4420b4d29bac 100644 --- a/drivers/spi/spi-dw.h +++ b/drivers/spi/spi-dw.h @@ -111,7 +111,6 @@ struct dw_spi_dma_ops { struct dw_spi { struct spi_controller *master; - enum dw_ssi_typetype; void __iomem*regs; unsigned long paddr; -- 2.27.0
[PATCH 29/30] dt-bindings: spi: dw: Add Baikal-T1 SPI Controllers
These controllers are based on the DW APB SSI IP-core and embedded into the SoC, so two of them are equipped with IRQ, DMA, 64 words FIFOs and 4 native CS, while another one as being utilized by the Baikal-T1 System Boot Controller has got a very limited resources: no IRQ, no DMA, only a single native chip-select and just 8 bytes Tx/Rx FIFOs available. That's why we have to mark the IRQ to be optional for the later interface. The SPI controller embedded into the Baikal-T1 System Boot Controller can be also used to directly access an external SPI flash by means of a dedicated FSM. The corresponding MMIO region availability is switchable by the embedded multiplexor, which phandle can be specified in the dts node. * We added a new example to test out the non-standard Baikal-T1 System Boot SPI Controller DT binding. Co-developed-by: Ramil Zaripov Signed-off-by: Ramil Zaripov Signed-off-by: Serge Semin --- .../bindings/spi/snps,dw-apb-ssi.yaml | 33 +-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml index c62cbe79f00d..d6ae35777dac 100644 --- a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml +++ b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.yaml @@ -22,6 +22,21 @@ allOf: properties: reg: minItems: 2 + - if: + properties: +compatible: + contains: +enum: + - baikal,bt1-sys-ssi +then: + properties: +mux-controls: + maxItems: 1 + required: +- mux-controls +else: + required: +- interrupts properties: compatible: @@ -44,12 +59,16 @@ properties: - const: snps,dw-apb-ssi - description: Intel Keem Bay SPI Controller const: intel,keembay-ssi + - description: Baikal-T1 SPI Controller +const: baikal,bt1-ssi + - description: Baikal-T1 System Boot SPI Controller +const: baikal,bt1-sys-ssi reg: minItems: 1 items: - description: DW APB SSI controller memory mapped registers - - description: SPI MST region map + - description: SPI MST region map or directly mapped SPI ROM interrupts: maxItems: 1 @@ -114,7 +133,6 @@ required: - reg - "#address-cells" - "#size-cells" - - interrupts - clocks examples: @@ -130,4 +148,15 @@ examples: cs-gpios = < 13 0>, < 14 0>; }; + - | +spi@1f040100 { + compatible = "baikal,bt1-sys-ssi"; + reg = <0x1f040100 0x900>, +<0x1c00 0x100>; + #address-cells = <1>; + #size-cells = <0>; + mux-controls = <_mux>; + clocks = <_sys>; + clock-names = "ssi_clk"; +}; ... -- 2.27.0
[PATCH 23/30] spi: dw: Explicitly de-assert CS on SPI transfer completion
By design of the currently available native set_cs callback, the CS de-assertion will be done only if it's required by the corresponding controller capability. But in order to pre-fill the Tx FIFO buffer with data during the SPI memory ops execution the SER register needs to be left cleared before that. We'll also need a way to explicitly set and clear the corresponding CS bit at a certain moment of the operation. Let's alter the set_cs function then to also de-activate the CS, when it's required. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index 65db4dd3ea8a..7a25ea6f4af6 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -100,7 +100,7 @@ void dw_spi_set_cs(struct spi_device *spi, bool enable) */ if (cs_high == enable) dw_writel(dws, DW_SPI_SER, BIT(spi->chip_select)); - else if (dws->caps & DW_SPI_CAP_CS_OVERRIDE) + else dw_writel(dws, DW_SPI_SER, 0); } EXPORT_SYMBOL_GPL(dw_spi_set_cs); -- 2.27.0
[PATCH 10/30] spi: dw: Add KeemBay Master capability
In a further commit we'll have to get rid of the update_cr0() callback and define a DW SSI capability instead. Since Keem Bay master/slave functionality is controller by the CTRL0 register bitfield, we need to first move the master mode selection into the internal corresponding update_cr0 method, which would be activated by means of the dedicated DW_SPI_CAP_KEEMBAY_MST capability setup. Note this will be also useful if the driver will be ever altered to support the DW SPI slave interface. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 4 drivers/spi/spi-dw-mmio.c | 20 +++- drivers/spi/spi-dw.h | 8 3 files changed, 15 insertions(+), 17 deletions(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index 78e5af8ed173..8f9737640ec1 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -252,6 +252,7 @@ u32 dw_spi_update_cr0_v1_01a(struct spi_controller *master, struct spi_device *spi, struct spi_transfer *transfer) { + struct dw_spi *dws = spi_controller_get_devdata(master); struct chip_data *chip = spi_get_ctldata(spi); u32 cr0; @@ -275,6 +276,9 @@ u32 dw_spi_update_cr0_v1_01a(struct spi_controller *master, /* CTRLR0[13] Shift Register Loop */ cr0 |= ((spi->mode & SPI_LOOP) ? 1 : 0) << DWC_SSI_CTRLR0_SRL_OFFSET; + if (dws->caps & DW_SPI_CAP_KEEMBAY_MST) + cr0 |= DWC_SSI_CTRLR0_KEEMBAY_MST; + return cr0; } EXPORT_SYMBOL_GPL(dw_spi_update_cr0_v1_01a); diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c index 7111cb7ca23b..c0d351fde782 100644 --- a/drivers/spi/spi-dw-mmio.c +++ b/drivers/spi/spi-dw-mmio.c @@ -48,13 +48,6 @@ struct dw_spi_mmio { #define SPARX5_FORCE_ENA 0xa4 #define SPARX5_FORCE_VAL 0xa8 -/* - * For Keem Bay, CTRLR0[31] is used to select controller mode. - * 0: SSI is slave - * 1: SSI is master - */ -#define KEEMBAY_CTRLR0_SSIC_IS_MST BIT(31) - struct dw_spi_mscc { struct regmap *syscon; void __iomem*spi_mst; /* Not sparx5 */ @@ -234,20 +227,13 @@ static int dw_spi_dwc_ssi_init(struct platform_device *pdev, return 0; } -static u32 dw_spi_update_cr0_keembay(struct spi_controller *master, -struct spi_device *spi, -struct spi_transfer *transfer) -{ - u32 cr0 = dw_spi_update_cr0_v1_01a(master, spi, transfer); - - return cr0 | KEEMBAY_CTRLR0_SSIC_IS_MST; -} - static int dw_spi_keembay_init(struct platform_device *pdev, struct dw_spi_mmio *dwsmmio) { + dwsmmio->dws.caps = DW_SPI_CAP_KEEMBAY_MST; + /* Register hook to configure CTRLR0 */ - dwsmmio->dws.update_cr0 = dw_spi_update_cr0_keembay; + dwsmmio->dws.update_cr0 = dw_spi_update_cr0_v1_01a; return 0; } diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h index 4c748b2853a8..da9b543322c9 100644 --- a/drivers/spi/spi-dw.h +++ b/drivers/spi/spi-dw.h @@ -71,6 +71,13 @@ #define DWC_SSI_CTRLR0_FRF_OFFSET 6 #define DWC_SSI_CTRLR0_DFS_OFFSET 0 +/* + * For Keem Bay, CTRLR0[31] is used to select controller mode. + * 0: SSI is slave + * 1: SSI is master + */ +#define DWC_SSI_CTRLR0_KEEMBAY_MST BIT(31) + /* Bit fields in SR, 7 bits */ #define SR_MASK0x7f/* cover 7 bits */ #define SR_BUSY(1 << 0) @@ -101,6 +108,7 @@ enum dw_ssi_type { /* DW SPI capabilities */ #define DW_SPI_CAP_CS_OVERRIDE BIT(0) +#define DW_SPI_CAP_KEEMBAY_MST BIT(1) struct dw_spi; struct dw_spi_dma_ops { -- 2.27.0
[PATCH 03/30] spi: dw: Initialize n_bytes before the memory barrier
Since n_bytes field of the DW SPI private data is also utilized by the IRQ handler, we need to make sure it' initialization is done before the memory barrier. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index 8b3ce5a0378a..1af74362461d 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -299,6 +299,7 @@ static int dw_spi_transfer_one(struct spi_controller *master, dws->dma_mapped = 0; spin_lock_irqsave(>buf_lock, flags); + dws->n_bytes = DIV_ROUND_UP(transfer->bits_per_word, BITS_PER_BYTE); dws->tx = (void *)transfer->tx_buf; dws->tx_end = dws->tx + transfer->len; dws->rx = transfer->rx_buf; @@ -323,7 +324,6 @@ static int dw_spi_transfer_one(struct spi_controller *master, } transfer->effective_speed_hz = dws->max_freq / chip->clk_div; - dws->n_bytes = DIV_ROUND_UP(transfer->bits_per_word, BITS_PER_BYTE); cr0 = dws->update_cr0(master, spi, transfer); dw_writel(dws, DW_SPI_CTRLR0, cr0); -- 2.27.0
[PATCH 09/30] spi: dw: Convert CS-override to DW SPI capabilities
There are several vendor-specific versions of the DW SPI controllers, each of which may have some peculiarities with respect to the original IP-core. Seeing it has already caused adding flags and a callback into the DW SPI private data, let's introduce a generic capabilities interface to tune the generic DW SPI controller driver up in accordance with the particular controller specifics. It's done by converting a simple Alpine-specific CS-override capability into the DW SPI controller capability activated by setting the DW_SPI_CAP_CS_OVERRIDE flag. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 4 ++-- drivers/spi/spi-dw-mmio.c | 2 +- drivers/spi/spi-dw.h | 7 ++- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index dd9574f9fafc..78e5af8ed173 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -104,7 +104,7 @@ void dw_spi_set_cs(struct spi_device *spi, bool enable) */ if (cs_high == enable) dw_writel(dws, DW_SPI_SER, BIT(spi->chip_select)); - else if (dws->cs_override) + else if (dws->caps & DW_SPI_CAP_CS_OVERRIDE) dw_writel(dws, DW_SPI_SER, 0); } EXPORT_SYMBOL_GPL(dw_spi_set_cs); @@ -435,7 +435,7 @@ static void spi_hw_init(struct device *dev, struct dw_spi *dws) } /* enable HW fixup for explicit CS deselect for Amazon's alpine chip */ - if (dws->cs_override) + if (dws->caps & DW_SPI_CAP_CS_OVERRIDE) dw_writel(dws, DW_SPI_CS_OVERRIDE, 0xF); } diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c index 18772c0c9220..7111cb7ca23b 100644 --- a/drivers/spi/spi-dw-mmio.c +++ b/drivers/spi/spi-dw-mmio.c @@ -204,7 +204,7 @@ static int dw_spi_mscc_sparx5_init(struct platform_device *pdev, static int dw_spi_alpine_init(struct platform_device *pdev, struct dw_spi_mmio *dwsmmio) { - dwsmmio->dws.cs_override = 1; + dwsmmio->dws.caps = DW_SPI_CAP_CS_OVERRIDE; /* Register hook to configure CTRLR0 */ dwsmmio->dws.update_cr0 = dw_spi_update_cr0; diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h index 4420b4d29bac..4c748b2853a8 100644 --- a/drivers/spi/spi-dw.h +++ b/drivers/spi/spi-dw.h @@ -2,6 +2,7 @@ #ifndef DW_SPI_HEADER_H #define DW_SPI_HEADER_H +#include #include #include #include @@ -98,6 +99,9 @@ enum dw_ssi_type { SSI_NS_MICROWIRE, }; +/* DW SPI capabilities */ +#define DW_SPI_CAP_CS_OVERRIDE BIT(0) + struct dw_spi; struct dw_spi_dma_ops { int (*dma_init)(struct device *dev, struct dw_spi *dws); @@ -118,7 +122,8 @@ struct dw_spi { u32 fifo_len; /* depth of the FIFO buffer */ u32 max_freq; /* max bus freq supported */ - int cs_override; + u32 caps; /* DW SPI capabilities */ + u32 reg_io_width; /* DR I/O width in bytes */ u16 bus_num; u16 num_cs; /* supported slave numbers */ -- 2.27.0
[PATCH 00/30] spi: dw: Add full Baikal-T1 SPI Controllers support
it is the data IO procedure and IRQ-based SPI-transfer implementation refactoring. The former one will look much simpler if the buffers initial pointers and the buffers length data utilized instead of the Tx/Rx buffers start and end pointers. The later one currently lacks of valid execution at the final stage of the SPI-transfer. So if there is no data left to send, but there is still data which needs to be received, the Tx FIFO Empty IRQ will constantly happen until all of the requested inbound data is received. So we suggest to fix that by taking the Rx FIFO Empty IRQ into account. Ninthly it's potentially errors prone to enable the DW APB SSI interrupts before enabling the chip. It specifically concerns a case if for some reason the DW APB SSI IRQs handler is executed before the controller is enabled. That will cause a part of the outbound data loss. So we suggest to reverse the order. Tenthly in order to be able to pre-initialize the Tx FIFO with data and only the start the SPI memory operations we need to have any CS de-activated. We'll fulfil that requirement by explicitly clearing the CS on the SPI transfer completion and at the explicit controller reset. Then seeing all the currently available and potentially being created types of the SPI transfers need to perform the DW APB SSI controller status register check and the errors handler procedure, we've created a common method for all of them. Eleventhly if before we've mostly had a series of fixups, cleanups and refactorings, here we've finally come to the new functionality implementation. It concerns the poll-based transfer (as Baikal-T1 System Boot SPI controller lacks a dedicated IRQ lane connected) and the SPI memory operations implementation. If the former feature is pretty much straightforward (see the patch log for details), the later one is a bit tricky. It's based on the EEPROM-read (write-then-read) and the Tx-only modes of the DW APB SSI controller, which as performing the automatic data read and write let's us to implement the faster IO procedure than using the Tx-Rx-mode-based approach. Having the memory-operations implemented that way is the best thing we can currently do to provide the errors-less SPI transfers to SPI devices with native CS attached. Note the approach utilized here to develop the SPI memory operations can be also used to create the "automatic CS toggle problem"-free(ish) SPI transfers (combine SPI-message transfers into two buffers, disable interrupts, push-pull the combined data). But we don't provide a solution in the framework of this patchset. It is a matter of a dedicated one, which we currently don't intend to spend our time on. Finally at the closure of the this patchset you'll find patches, which provide the Baikal-T1-specific DW APB SSI controllers support. The SoC has got three SPI controllers. Two of them are pretty much normal DW APB SSI interfaces: with IRQ, DMA, FIFOs of 64 words depth, 4x CSs. But the third one as being a part of the Baikal-T1 System Boot Controller has got a very limited resources: no IRQ, no DMA, only a single native chip-select and Tx/Rx FIFOs with just 8 words depth available. In order to provide a transparent initial boot code execution the System Boot SPI Controller is also utilized by an vendor-specific IP-block, which exposes an SPI flash memory direct mapping interface. Please see the corresponding patch for details. Link: https://lore.kernel.org/linux-spi/20200508093621.31619-1-sergey.se...@baikalelectronics.ru/ [1] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt, Section "KERNEL I/O BARRIER EFFECTS" Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Ramil Zaripov Cc: Pavel Parkhomenko Cc: Andy Shevchenko Cc: Andy Shevchenko Cc: Lars Povlsen Cc: wuxu.wu Cc: Feng Tang Cc: Rob Herring Cc: linux-...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (30): spi: dw: Discard IRQ threshold macro spi: dw: Use ternary op to init set_cs callback spi: dw: Initialize n_bytes before the memory barrier Revert: spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls spi: dw: Clear IRQ status on DW SPI controller reset spi: dw: Disable all IRQs when controller is unused spi: dw: Use relaxed IO-methods to access FIFOs spi: dw: Discard DW SSI chip type storages spi: dw: Convert CS-override to DW SPI capabilities spi: dw: Add KeemBay Master capability spi: dw: Add DWC SSI capability spi: dw: Detach SPI device specific CR0 config method spi: dw: Update SPI bus speed in a config function spi: dw: Simplify the SPI bus speed config procedure spi: dw: Update Rx sample delay in the config function spi: dw: Add DW SPI controller config structure spi: dw: Refactor data IO procedure spi: dw: Refactor IRQ-based SPI transfer procedure spi: dw: Perform IRQ setup in a dedicated function spi: dw: Unmask IRQs after enabling the chip
[PATCH 02/30] spi: dw: Use ternary op to init set_cs callback
Simplify the dw_spi_add_host() method a bit by replacing the set_cs callback overwrite procedure with direct setting the callback if a custom version of one is specified. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index 55afdcee7d2b..8b3ce5a0378a 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -482,7 +482,7 @@ int dw_spi_add_host(struct device *dev, struct dw_spi *dws) master->num_chipselect = dws->num_cs; master->setup = dw_spi_setup; master->cleanup = dw_spi_cleanup; - master->set_cs = dw_spi_set_cs; + master->set_cs = dws->set_cs ?: dw_spi_set_cs; master->transfer_one = dw_spi_transfer_one; master->handle_err = dw_spi_handle_err; master->max_speed_hz = dws->max_freq; @@ -491,9 +491,6 @@ int dw_spi_add_host(struct device *dev, struct dw_spi *dws) master->flags = SPI_MASTER_GPIO_SS; master->auto_runtime_pm = true; - if (dws->set_cs) - master->set_cs = dws->set_cs; - /* Get default rx sample delay */ device_property_read_u32(dev, "rx-sample-delay-ns", >def_rx_sample_dly_ns); -- 2.27.0
[PATCH 01/30] spi: dw: Discard IRQ threshold macro
The macro has been unused since a half of FIFO length was defined to be a marker of the IRQ. Let's remove it definition. Signed-off-by: Serge Semin --- drivers/spi/spi-dw.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h index 90dfd21622d6..51bab30b9f85 100644 --- a/drivers/spi/spi-dw.h +++ b/drivers/spi/spi-dw.h @@ -92,9 +92,6 @@ #define SPI_DMA_RDMAE (1 << 0) #define SPI_DMA_TDMAE (1 << 1) -/* TX RX interrupt level threshold, max can be 256 */ -#define SPI_INT_THRESHOLD 32 - enum dw_ssi_type { SSI_MOTO_SPI = 0, SSI_TI_SSP, -- 2.27.0
[PATCH 12/30] spi: dw: Detach SPI device specific CR0 config method
Indeed there is no point in detecting the SPI peripheral device parameters and initializing the CR0 register fields each time an SPI transfer is executed. Instead let's define a dedicated CR0 chip-data member, which will be initialized in accordance with the SPI device settings at the moment of setting it up. By doing so we'll finally make the SPI device chip_data serving as it's supposed to - to preserve the SPI device specific DW SPI configuration. See spi-fsl-dspi.c, spi-pl022.c, spi-pxa2xx.c drivers for example of the way the chip data is utilized. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-core.c | 35 +++ 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c index c21641a485ce..a9644351d75f 100644 --- a/drivers/spi/spi-dw-core.c +++ b/drivers/spi/spi-dw-core.c @@ -27,6 +27,7 @@ struct chip_data { u16 clk_div;/* baud rate divider */ u32 speed_hz; /* baud rate */ + u32 cr0; u32 rx_sample_dly; /* RX sample delay */ }; @@ -228,31 +229,41 @@ static irqreturn_t dw_spi_irq(int irq, void *dev_id) return dws->transfer_handler(dws); } -static void dw_spi_update_cr0(struct dw_spi *dws, struct spi_device *spi, - struct spi_transfer *transfer) +static u32 dw_spi_get_cr0(struct dw_spi *dws, struct spi_device *spi) { - struct chip_data *chip = spi_get_ctldata(spi); - u32 cr0; - - cr0 = (transfer->bits_per_word - 1); + u32 cr0 = 0; if (!(dws->caps & DW_SPI_CAP_DWC_SSI)) { cr0 |= SSI_MOTO_SPI << SPI_FRF_OFFSET; cr0 |= ((spi->mode & SPI_CPOL) ? 1 : 0) << SPI_SCOL_OFFSET; cr0 |= ((spi->mode & SPI_CPHA) ? 1 : 0) << SPI_SCPH_OFFSET; cr0 |= ((spi->mode & SPI_LOOP) ? 1 : 0) << SPI_SRL_OFFSET; - cr0 |= chip->tmode << SPI_TMOD_OFFSET; } else { cr0 |= SSI_MOTO_SPI << DWC_SSI_CTRLR0_FRF_OFFSET; cr0 |= ((spi->mode & SPI_CPOL) ? 1 : 0) << DWC_SSI_CTRLR0_SCPOL_OFFSET; cr0 |= ((spi->mode & SPI_CPHA) ? 1 : 0) << DWC_SSI_CTRLR0_SCPH_OFFSET; cr0 |= ((spi->mode & SPI_LOOP) ? 1 : 0) << DWC_SSI_CTRLR0_SRL_OFFSET; - cr0 |= chip->tmode << DWC_SSI_CTRLR0_TMOD_OFFSET; if (dws->caps & DW_SPI_CAP_KEEMBAY_MST) cr0 |= DWC_SSI_CTRLR0_KEEMBAY_MST; } + return cr0; +} + +static void dw_spi_update_cr0(struct dw_spi *dws, struct spi_device *spi, + struct spi_transfer *transfer) +{ + struct chip_data *chip = spi_get_ctldata(spi); + u32 cr0 = chip->cr0; + + cr0 |= (transfer->bits_per_word - 1); + + if (!(dws->caps & DW_SPI_CAP_DWC_SSI)) + cr0 |= chip->tmode << SPI_TMOD_OFFSET; + else + cr0 |= chip->tmode << DWC_SSI_CTRLR0_TMOD_OFFSET; + dw_writel(dws, DW_SPI_CTRLR0, cr0); } @@ -350,6 +361,7 @@ static void dw_spi_handle_err(struct spi_controller *master, /* This may be called twice for each spi dev */ static int dw_spi_setup(struct spi_device *spi) { + struct dw_spi *dws = spi_controller_get_devdata(spi->controller); struct chip_data *chip; /* Only alloc on first setup */ @@ -373,6 +385,13 @@ static int dw_spi_setup(struct spi_device *spi) dws->max_freq); } + /* +* Update CR0 data each time the setup callback is invoked since +* the device parameters could have been changed, for instance, by +* the MMC SPI driver or something else. +*/ + chip->cr0 = dw_spi_get_cr0(dws, spi); + chip->tmode = SPI_TMOD_TR; return 0; -- 2.27.0
[PATCH 07/30] spi: dw: Use relaxed IO-methods to access FIFOs
In accordance with [1] the relaxed methods are guaranteed to be ordered with respect to other accesses from the same CPU thread to the same peripheral. This is what we need during the data read/write from/to the controller FIFOs being executed within a single IRQ handler or a kernel task. Such optimization shall significantly speed the data reader and writer up. For instance, the relaxed IO-accessors utilization on Baikal-T1 lets the driver to support the SPI memory operations with bus frequency three-fold faster than if normal IO-accessors would be used. [1] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt, Section "KERNEL I/O BARRIER EFFECTS" Signed-off-by: Serge Semin --- drivers/spi/spi-dw.h | 18 -- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h index ff77f39047ce..063fa1b1352d 100644 --- a/drivers/spi/spi-dw.h +++ b/drivers/spi/spi-dw.h @@ -161,29 +161,19 @@ static inline u32 dw_readl(struct dw_spi *dws, u32 offset) return __raw_readl(dws->regs + offset); } -static inline u16 dw_readw(struct dw_spi *dws, u32 offset) -{ - return __raw_readw(dws->regs + offset); -} - static inline void dw_writel(struct dw_spi *dws, u32 offset, u32 val) { __raw_writel(val, dws->regs + offset); } -static inline void dw_writew(struct dw_spi *dws, u32 offset, u16 val) -{ - __raw_writew(val, dws->regs + offset); -} - static inline u32 dw_read_io_reg(struct dw_spi *dws, u32 offset) { switch (dws->reg_io_width) { case 2: - return dw_readw(dws, offset); + return readw_relaxed(dws->regs + offset); case 4: default: - return dw_readl(dws, offset); + return readl_relaxed(dws->regs + offset); } } @@ -191,11 +181,11 @@ static inline void dw_write_io_reg(struct dw_spi *dws, u32 offset, u32 val) { switch (dws->reg_io_width) { case 2: - dw_writew(dws, offset, val); + writew_relaxed(val, dws->regs + offset); break; case 4: default: - dw_writel(dws, offset, val); + writel_relaxed(val, dws->regs + offset); break; } } -- 2.27.0
[PATCH v2 05/11] spi: dw-dma: Move DMA transfers submission to the channels prep methods
Indeed we can freely move the dmaengine_submit() method invocation and the Tx and Rx busy flag setting into the DMA Tx/Rx prepare methods. Since the Tx/Rx preparation method is now mainly used for the DMA transfers submission, here we suggest to rename it to have the _submit_{r,t}x suffix instead. By having this alteration applied first we implement another code preparation before adding the one-by-one DMA SG entries transmission, second we now have the dma_async_tx_descriptor descriptor used locally only in the new DMA transfers submission methods (this will be cleaned up a bit later), third we make the generic transfer method more readable, where now the functionality of submission, execution and wait procedures is transparently split up instead of having a preparation, intermixed submission/execution and wait procedures. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index d2a67dee1a66..769d10ca74b4 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -272,7 +272,7 @@ static int dw_spi_dma_config_tx(struct dw_spi *dws) } static struct dma_async_tx_descriptor * -dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) +dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *txdesc; @@ -287,6 +287,9 @@ dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) txdesc->callback = dw_spi_dma_tx_done; txdesc->callback_param = dws; + dmaengine_submit(txdesc); + set_bit(TX_BUSY, >dma_chan_busy); + return txdesc; } @@ -364,7 +367,7 @@ static int dw_spi_dma_config_rx(struct dw_spi *dws) return dmaengine_slave_config(dws->rxchan, ); } -static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, +static struct dma_async_tx_descriptor *dw_spi_dma_submit_rx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *rxdesc; @@ -380,6 +383,9 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, rxdesc->callback = dw_spi_dma_rx_done; rxdesc->callback_param = dws; + dmaengine_submit(rxdesc); + set_bit(RX_BUSY, >dma_chan_busy); + return rxdesc; } @@ -426,25 +432,21 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) struct dma_async_tx_descriptor *txdesc, *rxdesc; int ret; - /* Prepare the TX dma transfer */ - txdesc = dw_spi_dma_prepare_tx(dws, xfer); + /* Submit the DMA Tx transfer */ + txdesc = dw_spi_dma_submit_tx(dws, xfer); if (!txdesc) return -EINVAL; - /* Prepare the RX dma transfer */ + /* Submit the DMA Rx transfer if required */ if (xfer->rx_buf) { - rxdesc = dw_spi_dma_prepare_rx(dws, xfer); + rxdesc = dw_spi_dma_submit_rx(dws, xfer); if (!rxdesc) return -EINVAL; /* rx must be started before tx due to spi instinct */ - set_bit(RX_BUSY, >dma_chan_busy); - dmaengine_submit(rxdesc); dma_async_issue_pending(dws->rxchan); } - set_bit(TX_BUSY, >dma_chan_busy); - dmaengine_submit(txdesc); dma_async_issue_pending(dws->txchan); ret = dw_spi_dma_wait(dws, xfer); -- 2.27.0
[PATCH v2 01/11] spi: dw-dma: Set DMA Level registers on init
Indeed the registers content doesn't get cleared when the SPI controller is disabled and enabled. Max burst lengths aren't changed since the Rx and Tx DMA channels are requested on init stage and are kept acquired until the device is removed. Obviously SPI controller FIFO depth can't be changed. Due to all of that we can safely move the DMA Transmit and Receive data level registers initialization to the SPI controller DMA init stage (when the SPI controller is being probed) instead of doing it for each SPI transfer when dma_setup is called. This shall speed the DMA-based SPI transfer initialization up a bit, particularly if the APB bus is relatively slow. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index bb390ff67d1d..a7ff1e357f8b 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -49,6 +49,7 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws) max_burst = RX_BURST_LEVEL; dws->rxburst = min(max_burst, def_burst); + dw_writel(dws, DW_SPI_DMARDLR, dws->rxburst - 1); ret = dma_get_slave_caps(dws->txchan, ); if (!ret && caps.max_burst) @@ -56,7 +57,19 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws) else max_burst = TX_BURST_LEVEL; + /* +* Having a Rx DMA channel serviced with higher priority than a Tx DMA +* channel might not be enough to provide a well balanced DMA-based +* SPI transfer interface. There might still be moments when the Tx DMA +* channel is occasionally handled faster than the Rx DMA channel. +* That in its turn will eventually cause the SPI Rx FIFO overflow if +* SPI bus speed is high enough to fill the SPI Rx FIFO in before it's +* cleared by the Rx DMA channel. In order to fix the problem the Tx +* DMA activity is intentionally slowed down by limiting the SPI Tx +* FIFO depth with a value twice bigger than the Tx burst length. +*/ dws->txburst = min(max_burst, def_burst); + dw_writel(dws, DW_SPI_DMATDLR, dws->txburst); } static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws) @@ -372,21 +385,6 @@ static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) { u16 imr = 0, dma_ctrl = 0; - /* -* Having a Rx DMA channel serviced with higher priority than a Tx DMA -* channel might not be enough to provide a well balanced DMA-based -* SPI transfer interface. There might still be moments when the Tx DMA -* channel is occasionally handled faster than the Rx DMA channel. -* That in its turn will eventually cause the SPI Rx FIFO overflow if -* SPI bus speed is high enough to fill the SPI Rx FIFO in before it's -* cleared by the Rx DMA channel. In order to fix the problem the Tx -* DMA activity is intentionally slowed down by limiting the SPI Tx -* FIFO depth with a value twice bigger than the Tx burst length -* calculated earlier by the dw_spi_dma_maxburst_init() method. -*/ - dw_writel(dws, DW_SPI_DMARDLR, dws->rxburst - 1); - dw_writel(dws, DW_SPI_DMATDLR, dws->txburst); - if (xfer->tx_buf) dma_ctrl |= SPI_DMA_TDMAE; if (xfer->rx_buf) -- 2.27.0
[PATCH v2 07/11] spi: dw-dma: Remove DMA Tx-desc passing around
It's pointless to pass the Rx and Tx transfers DMA Tx-descriptors, since they are used in the Tx/Rx submit method only. Instead just return the submission status from these methods. This alteration will make the code less complex. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 31 ++- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index aa3900809126..9f70818acce6 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -271,8 +271,7 @@ static int dw_spi_dma_config_tx(struct dw_spi *dws) return dmaengine_slave_config(dws->txchan, ); } -static struct dma_async_tx_descriptor * -dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *txdesc; dma_cookie_t cookie; @@ -284,7 +283,7 @@ dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!txdesc) - return NULL; + return -ENOMEM; txdesc->callback = dw_spi_dma_tx_done; txdesc->callback_param = dws; @@ -293,12 +292,12 @@ dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) ret = dma_submit_error(cookie); if (ret) { dmaengine_terminate_sync(dws->txchan); - return NULL; + return ret; } set_bit(TX_BUSY, >dma_chan_busy); - return txdesc; + return 0; } static inline bool dw_spi_dma_rx_busy(struct dw_spi *dws) @@ -375,8 +374,7 @@ static int dw_spi_dma_config_rx(struct dw_spi *dws) return dmaengine_slave_config(dws->rxchan, ); } -static struct dma_async_tx_descriptor *dw_spi_dma_submit_rx(struct dw_spi *dws, - struct spi_transfer *xfer) +static int dw_spi_dma_submit_rx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *rxdesc; dma_cookie_t cookie; @@ -388,7 +386,7 @@ static struct dma_async_tx_descriptor *dw_spi_dma_submit_rx(struct dw_spi *dws, DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!rxdesc) - return NULL; + return -ENOMEM; rxdesc->callback = dw_spi_dma_rx_done; rxdesc->callback_param = dws; @@ -397,12 +395,12 @@ static struct dma_async_tx_descriptor *dw_spi_dma_submit_rx(struct dw_spi *dws, ret = dma_submit_error(cookie); if (ret) { dmaengine_terminate_sync(dws->rxchan); - return NULL; + return ret; } set_bit(RX_BUSY, >dma_chan_busy); - return rxdesc; + return 0; } static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) @@ -445,19 +443,18 @@ static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) { - struct dma_async_tx_descriptor *txdesc, *rxdesc; int ret; /* Submit the DMA Tx transfer */ - txdesc = dw_spi_dma_submit_tx(dws, xfer); - if (!txdesc) - return -EINVAL; + ret = dw_spi_dma_submit_tx(dws, xfer); + if (ret) + return ret; /* Submit the DMA Rx transfer if required */ if (xfer->rx_buf) { - rxdesc = dw_spi_dma_submit_rx(dws, xfer); - if (!rxdesc) - return -EINVAL; + ret = dw_spi_dma_submit_rx(dws, xfer); + if (ret) + return ret; /* rx must be started before tx due to spi instinct */ dma_async_issue_pending(dws->rxchan); -- 2.27.0
[PATCH v2 06/11] spi: dw-dma: Check DMA Tx-desc submission status
We suggest to add the dmaengine_submit() return value test for errors. It has been unnecessary while the driver was expected to be utilized in pair with DW DMAC. But since now the driver can be used with any DMA engine, it might be useful to track the errors on DMA submissions so not miss them and get into an unpredictable driver behaviour. Signed-off-by: Serge Semin --- Changelog v2: - Replace negative conditional statements with the positive ones. - Terminate the prepared descriptors on submission errors. --- drivers/spi/spi-dw-dma.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 769d10ca74b4..aa3900809126 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -275,6 +275,8 @@ static struct dma_async_tx_descriptor * dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *txdesc; + dma_cookie_t cookie; + int ret; txdesc = dmaengine_prep_slave_sg(dws->txchan, xfer->tx_sg.sgl, @@ -287,7 +289,13 @@ dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) txdesc->callback = dw_spi_dma_tx_done; txdesc->callback_param = dws; - dmaengine_submit(txdesc); + cookie = dmaengine_submit(txdesc); + ret = dma_submit_error(cookie); + if (ret) { + dmaengine_terminate_sync(dws->txchan); + return NULL; + } + set_bit(TX_BUSY, >dma_chan_busy); return txdesc; @@ -371,6 +379,8 @@ static struct dma_async_tx_descriptor *dw_spi_dma_submit_rx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *rxdesc; + dma_cookie_t cookie; + int ret; rxdesc = dmaengine_prep_slave_sg(dws->rxchan, xfer->rx_sg.sgl, @@ -383,7 +393,13 @@ static struct dma_async_tx_descriptor *dw_spi_dma_submit_rx(struct dw_spi *dws, rxdesc->callback = dw_spi_dma_rx_done; rxdesc->callback_param = dws; - dmaengine_submit(rxdesc); + cookie = dmaengine_submit(rxdesc); + ret = dma_submit_error(cookie); + if (ret) { + dmaengine_terminate_sync(dws->rxchan); + return NULL; + } + set_bit(RX_BUSY, >dma_chan_busy); return rxdesc; -- 2.27.0
[PATCH v2 08/11] spi: dw-dma: Detach DMA transfer into a dedicated method
In order to add an alternative method of DMA-based SPI transfer first we need to detach the currently available one from the common code. Here we move the normal DMA-based SPI transfer execution functionality into a dedicated method. It will be utilized if either the DMA engine supports an unlimited number SG entries or Tx-only SPI transfer is requested. But currently just use it for any SPI transfer. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 9f70818acce6..f2baefcae9ae 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -441,7 +441,8 @@ static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) return 0; } -static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_transfer_all(struct dw_spi *dws, + struct spi_transfer *xfer) { int ret; @@ -462,7 +463,14 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) dma_async_issue_pending(dws->txchan); - ret = dw_spi_dma_wait(dws, xfer); + return dw_spi_dma_wait(dws, xfer); +} + +static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) +{ + int ret; + + ret = dw_spi_dma_transfer_all(dws, xfer); if (ret) return ret; -- 2.27.0
[PATCH v2 09/11] spi: dw-dma: Move DMAC register cleanup to DMA transfer method
DW APB SSI DMA driver doesn't use the native SPI core wait API since commit bdbdf0f06337 ("spi: dw: Locally wait for the DMA transfers completion"). Due to that the driver can now clear the DMAC register in a single place synchronously with the DMA transactions completion or failure. After that all the possible code paths are still covered: 1) DMA completion callbacks are executed in case if the corresponding DMA transactions are finished. When they are, one of them will eventually wake the SPI messages pump kernel thread and dw_spi_dma_transfer_all() method will clean the DMAC register as implied by this patch. 2) dma_stop is called when the SPI core detects an error either returned from the transfer_one() callback or set in the SPI message status field. Both types of errors will be noticed by the dw_spi_dma_transfer_all() method. 3) dma_exit is called when either SPI controller driver or the corresponding device is removed. In any case the SPI core will first flush the SPI messages pump kernel thread, so any pending or in-fly SPI transfers will be finished before that. Due to all of that let's simplify the DW APB SSI DMA driver a bit and move the DMAC register cleanup to a single place in the dw_spi_dma_transfer_all() method. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index f2baefcae9ae..935f073a3523 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -152,8 +152,6 @@ static void dw_spi_dma_exit(struct dw_spi *dws) dmaengine_terminate_sync(dws->rxchan); dma_release_channel(dws->rxchan); } - - dw_writel(dws, DW_SPI_DMACR, 0); } static irqreturn_t dw_spi_dma_transfer_handler(struct dw_spi *dws) @@ -252,7 +250,6 @@ static void dw_spi_dma_tx_done(void *arg) if (test_bit(RX_BUSY, >dma_chan_busy)) return; - dw_writel(dws, DW_SPI_DMACR, 0); complete(>dma_completion); } @@ -355,7 +352,6 @@ static void dw_spi_dma_rx_done(void *arg) if (test_bit(TX_BUSY, >dma_chan_busy)) return; - dw_writel(dws, DW_SPI_DMACR, 0); complete(>dma_completion); } @@ -449,13 +445,13 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, /* Submit the DMA Tx transfer */ ret = dw_spi_dma_submit_tx(dws, xfer); if (ret) - return ret; + goto err_clear_dmac; /* Submit the DMA Rx transfer if required */ if (xfer->rx_buf) { ret = dw_spi_dma_submit_rx(dws, xfer); if (ret) - return ret; + goto err_clear_dmac; /* rx must be started before tx due to spi instinct */ dma_async_issue_pending(dws->rxchan); @@ -463,7 +459,12 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, dma_async_issue_pending(dws->txchan); - return dw_spi_dma_wait(dws, xfer); + ret = dw_spi_dma_wait(dws, xfer); + +err_clear_dmac: + dw_writel(dws, DW_SPI_DMACR, 0); + + return ret; } static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) @@ -496,8 +497,6 @@ static void dw_spi_dma_stop(struct dw_spi *dws) dmaengine_terminate_sync(dws->rxchan); clear_bit(RX_BUSY, >dma_chan_busy); } - - dw_writel(dws, DW_SPI_DMACR, 0); } static const struct dw_spi_dma_ops dw_spi_dma_mfld_ops = { -- 2.27.0
[PATCH v2 00/11] spi: dw-dma: Add max SG entries burst capability support
Tx and Rx SPI transfer SG-lists in a single DMA transaction without software intervention, and for the full-duplex SPI-transfers. Link: https://lore.kernel.org/linux-spi/20200731075953.14416-1-sergey.se...@baikalelectronics.ru Changelog v2: - Replace negative conditional statements with the positive ones in the dw_spi_dma_submit_{tx,rx}() methods. - Terminate the prepared DMA Tx-descriptors on submission errors. - Split the patch "spi: dw-dma: Move DMA transfers submission to the channels prep methods" up into a series of more simple commits. Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Georgy Vlasov Cc: Ramil Zaripov Cc: Pavel Parkhomenko Cc: Peter Ujfalusi Cc: Andy Shevchenko Cc: Andy Shevchenko Cc: Feng Tang Cc: Vinod Koul Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (11): spi: dw-dma: Set DMA Level registers on init spi: dw-dma: Fail DMA-based transfer if no Tx-buffer specified spi: dw-dma: Configure the DMA channels in dma_setup spi: dw-dma: Check rx_buf availability in the xfer method spi: dw-dma: Move DMA transfers submission to the channels prep methods spi: dw-dma: Check DMA Tx-desc submission status spi: dw-dma: Remove DMA Tx-desc passing around spi: dw-dma: Detach DMA transfer into a dedicated method spi: dw-dma: Move DMAC register cleanup to DMA transfer method spi: dw-dma: Pass exact data to the DMA submit and wait methods spi: dw-dma: Add one-by-one SG list entries transfer drivers/spi/spi-dw-dma.c | 316 ++- drivers/spi/spi-dw.h | 1 + 2 files changed, 245 insertions(+), 72 deletions(-) -- 2.27.0
[PATCH v2 02/11] spi: dw-dma: Fail DMA-based transfer if no Tx-buffer specified
Since commit 46164fde6b78 ("spi: dw: Fix Rx-only DMA transfers") if DMA interface is enabled, then Tx-buffer must be available in each SPI transfer. It's required since in order to activate the incoming data reception either DMA or CPU must be pushing data out to the SPI bus. But the DW APB SSI DMA driver code is still left in state as if Tx-buffer might be optional, which is no longer true. Let's fix it so an error would be returned if no Tx-buffer detected and DMA Tx would be always enabled. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index a7ff1e357f8b..1b013ac94a3f 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -262,9 +262,6 @@ dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) struct dma_slave_config txconf; struct dma_async_tx_descriptor *txdesc; - if (!xfer->tx_buf) - return NULL; - memset(, 0, sizeof(txconf)); txconf.direction = DMA_MEM_TO_DEV; txconf.dst_addr = dws->dma_addr; @@ -383,17 +380,19 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) { - u16 imr = 0, dma_ctrl = 0; + u16 imr, dma_ctrl; - if (xfer->tx_buf) - dma_ctrl |= SPI_DMA_TDMAE; + if (!xfer->tx_buf) + return -EINVAL; + + /* Set the DMA handshaking interface */ + dma_ctrl = SPI_DMA_TDMAE; if (xfer->rx_buf) dma_ctrl |= SPI_DMA_RDMAE; dw_writel(dws, DW_SPI_DMACR, dma_ctrl); /* Set the interrupt mask */ - if (xfer->tx_buf) - imr |= SPI_INT_TXOI; + imr = SPI_INT_TXOI; if (xfer->rx_buf) imr |= SPI_INT_RXUI | SPI_INT_RXOI; spi_umask_intr(dws, imr); @@ -412,6 +411,8 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) /* Prepare the TX dma transfer */ txdesc = dw_spi_dma_prepare_tx(dws, xfer); + if (!txdesc) + return -EINVAL; /* Prepare the RX dma transfer */ rxdesc = dw_spi_dma_prepare_rx(dws, xfer); @@ -423,17 +424,15 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) dma_async_issue_pending(dws->rxchan); } - if (txdesc) { - set_bit(TX_BUSY, >dma_chan_busy); - dmaengine_submit(txdesc); - dma_async_issue_pending(dws->txchan); - } + set_bit(TX_BUSY, >dma_chan_busy); + dmaengine_submit(txdesc); + dma_async_issue_pending(dws->txchan); ret = dw_spi_dma_wait(dws, xfer); if (ret) return ret; - if (txdesc && dws->master->cur_msg->status == -EINPROGRESS) { + if (dws->master->cur_msg->status == -EINPROGRESS) { ret = dw_spi_dma_wait_tx_done(dws, xfer); if (ret) return ret; -- 2.27.0
[PATCH v2 03/11] spi: dw-dma: Configure the DMA channels in dma_setup
Mainly this is a preparation patch before adding one-by-one DMA SG entries transmission. But logically the Tx and Rx DMA channels setup should be performed in the dma_setup() callback anyway. So we'll move the DMA slave channels src/dst burst lengths, address and address width configuration from the Tx/Rx channels preparation methods to the dedicated functions and then make sure it's called at the DMA setup stage. Note we now make sure the return value of the dmaengine_slave_config() method doesn't indicate an error. It has been unnecessary in case if Dw DMAC is utilized as a DMA engine, since its device_config() callback always returns zero (though it might change in future). But since DW APB SSI driver now supports any DMA back-end we must make sure the DMA device configuration has been successful before proceeding with further setups. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 42 +--- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 1b013ac94a3f..da17897b8acb 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -256,11 +256,9 @@ static void dw_spi_dma_tx_done(void *arg) complete(>dma_completion); } -static struct dma_async_tx_descriptor * -dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_config_tx(struct dw_spi *dws) { struct dma_slave_config txconf; - struct dma_async_tx_descriptor *txdesc; memset(, 0, sizeof(txconf)); txconf.direction = DMA_MEM_TO_DEV; @@ -270,7 +268,13 @@ dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) txconf.dst_addr_width = dw_spi_dma_convert_width(dws->n_bytes); txconf.device_fc = false; - dmaengine_slave_config(dws->txchan, ); + return dmaengine_slave_config(dws->txchan, ); +} + +static struct dma_async_tx_descriptor * +dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) +{ + struct dma_async_tx_descriptor *txdesc; txdesc = dmaengine_prep_slave_sg(dws->txchan, xfer->tx_sg.sgl, @@ -345,14 +349,9 @@ static void dw_spi_dma_rx_done(void *arg) complete(>dma_completion); } -static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, - struct spi_transfer *xfer) +static int dw_spi_dma_config_rx(struct dw_spi *dws) { struct dma_slave_config rxconf; - struct dma_async_tx_descriptor *rxdesc; - - if (!xfer->rx_buf) - return NULL; memset(, 0, sizeof(rxconf)); rxconf.direction = DMA_DEV_TO_MEM; @@ -362,7 +361,16 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, rxconf.src_addr_width = dw_spi_dma_convert_width(dws->n_bytes); rxconf.device_fc = false; - dmaengine_slave_config(dws->rxchan, ); + return dmaengine_slave_config(dws->rxchan, ); +} + +static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, + struct spi_transfer *xfer) +{ + struct dma_async_tx_descriptor *rxdesc; + + if (!xfer->rx_buf) + return NULL; rxdesc = dmaengine_prep_slave_sg(dws->rxchan, xfer->rx_sg.sgl, @@ -381,10 +389,22 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) { u16 imr, dma_ctrl; + int ret; if (!xfer->tx_buf) return -EINVAL; + /* Setup DMA channels */ + ret = dw_spi_dma_config_tx(dws); + if (ret) + return ret; + + if (xfer->rx_buf) { + ret = dw_spi_dma_config_rx(dws); + if (ret) + return ret; + } + /* Set the DMA handshaking interface */ dma_ctrl = SPI_DMA_TDMAE; if (xfer->rx_buf) -- 2.27.0
[PATCH v2 10/11] spi: dw-dma: Pass exact data to the DMA submit and wait methods
In order to use the DMA submission and waiting methods in both generic DMA-based SPI transfer and one-by-one DMA SG entries transmission functions, we need to modify the dw_spi_dma_wait() and dw_spi_dma_submit_tx()/dw_spi_dma_submit_rx() prototypes. So instead of getting the SPI transfer object as the second argument they must accept the exact data structure instances they imply to use. Those are the current transfer length and the SPI bus frequency in case of dw_spi_dma_wait(), and SG list together with number of list entries in case of the DMA Tx/Rx submission methods. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 35 +-- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 935f073a3523..f333c2e23bf6 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -188,12 +188,12 @@ static enum dma_slave_buswidth dw_spi_dma_convert_width(u8 n_bytes) return DMA_SLAVE_BUSWIDTH_UNDEFINED; } -static int dw_spi_dma_wait(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_wait(struct dw_spi *dws, unsigned int len, u32 speed) { unsigned long long ms; - ms = xfer->len * MSEC_PER_SEC * BITS_PER_BYTE; - do_div(ms, xfer->effective_speed_hz); + ms = len * MSEC_PER_SEC * BITS_PER_BYTE; + do_div(ms, speed); ms += ms + 200; if (ms > UINT_MAX) @@ -268,17 +268,16 @@ static int dw_spi_dma_config_tx(struct dw_spi *dws) return dmaengine_slave_config(dws->txchan, ); } -static int dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_submit_tx(struct dw_spi *dws, struct scatterlist *sgl, + unsigned int nents) { struct dma_async_tx_descriptor *txdesc; dma_cookie_t cookie; int ret; - txdesc = dmaengine_prep_slave_sg(dws->txchan, - xfer->tx_sg.sgl, - xfer->tx_sg.nents, - DMA_MEM_TO_DEV, - DMA_PREP_INTERRUPT | DMA_CTRL_ACK); + txdesc = dmaengine_prep_slave_sg(dws->txchan, sgl, nents, +DMA_MEM_TO_DEV, +DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!txdesc) return -ENOMEM; @@ -370,17 +369,16 @@ static int dw_spi_dma_config_rx(struct dw_spi *dws) return dmaengine_slave_config(dws->rxchan, ); } -static int dw_spi_dma_submit_rx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_submit_rx(struct dw_spi *dws, struct scatterlist *sgl, + unsigned int nents) { struct dma_async_tx_descriptor *rxdesc; dma_cookie_t cookie; int ret; - rxdesc = dmaengine_prep_slave_sg(dws->rxchan, - xfer->rx_sg.sgl, - xfer->rx_sg.nents, - DMA_DEV_TO_MEM, - DMA_PREP_INTERRUPT | DMA_CTRL_ACK); + rxdesc = dmaengine_prep_slave_sg(dws->rxchan, sgl, nents, +DMA_DEV_TO_MEM, +DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!rxdesc) return -ENOMEM; @@ -443,13 +441,14 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, int ret; /* Submit the DMA Tx transfer */ - ret = dw_spi_dma_submit_tx(dws, xfer); + ret = dw_spi_dma_submit_tx(dws, xfer->tx_sg.sgl, xfer->tx_sg.nents); if (ret) goto err_clear_dmac; /* Submit the DMA Rx transfer if required */ if (xfer->rx_buf) { - ret = dw_spi_dma_submit_rx(dws, xfer); + ret = dw_spi_dma_submit_rx(dws, xfer->rx_sg.sgl, + xfer->rx_sg.nents); if (ret) goto err_clear_dmac; @@ -459,7 +458,7 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, dma_async_issue_pending(dws->txchan); - ret = dw_spi_dma_wait(dws, xfer); + ret = dw_spi_dma_wait(dws, xfer->len, xfer->effective_speed_hz); err_clear_dmac: dw_writel(dws, DW_SPI_DMACR, 0); -- 2.27.0
[PATCH v2 11/11] spi: dw-dma: Add one-by-one SG list entries transfer
In case if at least one of the requested DMA engine channels doesn't support the hardware accelerated SG list entries traverse, the DMA driver will most likely work that around by performing the IRQ-based SG list entries resubmission. That might and will cause a problem if the DMA Tx channel is recharged and re-executed before the Rx DMA channel. Due to non-deterministic IRQ-handler execution latency the DMA Tx channel will start pushing data to the SPI bus before the Rx DMA channel is even reinitialized with the next inbound SG list entry. By doing so the DMA Tx channel will implicitly start filling the DW APB SSI Rx FIFO up, which while the DMA Rx channel being recharged and re-executed will eventually be overflown. In order to solve the problem we have to feed the DMA engine with SG list entries one-by-one. It shall keep the DW APB SSI Tx and Rx FIFOs synchronized and prevent the Rx FIFO overflow. Since in general the SPI tx_sg and rx_sg lists may have different number of entries of different lengths (though total length should match) we virtually split the SG-lists to the set of DMA transfers, which length is a minimum of the ordered SG-entries lengths. The solution described above is only executed if a full-duplex SPI transfer is requested and the DMA engine hasn't provided channels with hardware accelerated SG list traverse capability to handle both SG lists at once. Signed-off-by: Serge Semin Suggested-by: Andy Shevchenko --- drivers/spi/spi-dw-dma.c | 137 ++- drivers/spi/spi-dw.h | 1 + 2 files changed, 137 insertions(+), 1 deletion(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index f333c2e23bf6..1cbb5a9efbba 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -72,6 +72,23 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws) dw_writel(dws, DW_SPI_DMATDLR, dws->txburst); } +static void dw_spi_dma_sg_burst_init(struct dw_spi *dws) +{ + struct dma_slave_caps tx = {0}, rx = {0}; + + dma_get_slave_caps(dws->txchan, ); + dma_get_slave_caps(dws->rxchan, ); + + if (tx.max_sg_burst > 0 && rx.max_sg_burst > 0) + dws->dma_sg_burst = min(tx.max_sg_burst, rx.max_sg_burst); + else if (tx.max_sg_burst > 0) + dws->dma_sg_burst = tx.max_sg_burst; + else if (rx.max_sg_burst > 0) + dws->dma_sg_burst = rx.max_sg_burst; + else + dws->dma_sg_burst = 0; +} + static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws) { struct dw_dma_slave dma_tx = { .dst_id = 1 }, *tx = _tx; @@ -109,6 +126,8 @@ static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws) dw_spi_dma_maxburst_init(dws); + dw_spi_dma_sg_burst_init(dws); + return 0; free_rxchan: @@ -138,6 +157,8 @@ static int dw_spi_dma_init_generic(struct device *dev, struct dw_spi *dws) dw_spi_dma_maxburst_init(dws); + dw_spi_dma_sg_burst_init(dws); + return 0; } @@ -466,11 +487,125 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, return ret; } +/* + * In case if at least one of the requested DMA channels doesn't support the + * hardware accelerated SG list entries traverse, the DMA driver will most + * likely work that around by performing the IRQ-based SG list entries + * resubmission. That might and will cause a problem if the DMA Tx channel is + * recharged and re-executed before the Rx DMA channel. Due to + * non-deterministic IRQ-handler execution latency the DMA Tx channel will + * start pushing data to the SPI bus before the Rx DMA channel is even + * reinitialized with the next inbound SG list entry. By doing so the DMA Tx + * channel will implicitly start filling the DW APB SSI Rx FIFO up, which while + * the DMA Rx channel being recharged and re-executed will eventually be + * overflown. + * + * In order to solve the problem we have to feed the DMA engine with SG list + * entries one-by-one. It shall keep the DW APB SSI Tx and Rx FIFOs + * synchronized and prevent the Rx FIFO overflow. Since in general the tx_sg + * and rx_sg lists may have different number of entries of different lengths + * (though total length should match) let's virtually split the SG-lists to the + * set of DMA transfers, which length is a minimum of the ordered SG-entries + * lengths. An ASCII-sketch of the implemented algo is following: + * xfer->len + *|___| + * tx_sg list:|___||__| + * rx_sg list:|_||| + * DMA transfers: |_|_|__|_|__| + * + * Note in order to have this workaround solving the denoted problem the DMA + * engine driver should properly initialize the max_sg_burst capability and set + * the DMA device max segment size parameter with maximum data block size the + * DMA engine supports. + */ + +static int dw_spi_dma_transfer_one(struct dw_spi *dw
[PATCH v2 04/11] spi: dw-dma: Check rx_buf availability in the xfer method
Checking rx_buf for being NULL and returning NULL from the Rx-channel preparation method doesn't let us to distinguish that situation from errors happening during the Rx SG-list preparation. So it's better to make sure that the rx_buf not-NULL and full-duplex communication is requested prior calling the Rx preparation method. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index da17897b8acb..d2a67dee1a66 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -369,9 +369,6 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, { struct dma_async_tx_descriptor *rxdesc; - if (!xfer->rx_buf) - return NULL; - rxdesc = dmaengine_prep_slave_sg(dws->rxchan, xfer->rx_sg.sgl, xfer->rx_sg.nents, @@ -435,10 +432,12 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) return -EINVAL; /* Prepare the RX dma transfer */ - rxdesc = dw_spi_dma_prepare_rx(dws, xfer); + if (xfer->rx_buf) { + rxdesc = dw_spi_dma_prepare_rx(dws, xfer); + if (!rxdesc) + return -EINVAL; - /* rx must be started before tx due to spi instinct */ - if (rxdesc) { + /* rx must be started before tx due to spi instinct */ set_bit(RX_BUSY, >dma_chan_busy); dmaengine_submit(rxdesc); dma_async_issue_pending(dws->rxchan); @@ -458,7 +457,7 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) return ret; } - if (rxdesc && dws->master->cur_msg->status == -EINPROGRESS) + if (xfer->rx_buf && dws->master->cur_msg->status == -EINPROGRESS) ret = dw_spi_dma_wait_rx_done(dws); return ret; -- 2.27.0
[PATCH v3] mtd: physmap: Add Baikal-T1 physically mapped ROM support
Baikal-T1 Boot Controller provides an access to a RO storages, which are physically mapped into the SoC MMIO space. In particularly there are Internal ROM embedded into the SoC with a pre-installed firmware, externally attached SPI flash (also accessed in the read-only mode) and a memory region, which mirrors one of them in accordance with the currently enabled system boot mode (also called Boot ROM). This commit adds the Internal ROM support to the physmap driver of the MTD kernel subsystem. The driver will create the Internal ROM MTD as long as it is defined in the system dts file. The physically mapped SPI flash region will be used to implement the SPI-mem interface. The mirroring memory region won't be accessible directly since it's redundant due to both bootable regions being exposed anyway. Note we had to create a dedicated code for the ROMs since read from the corresponding memory regions must be done via the dword-aligned addresses. Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: Lee Jones Cc: linux-m...@vger.kernel.org --- Link: https://lore.kernel.org/linux-mtd/20200508100905.5854-1-sergey.se...@baikalelectronics.ru/ Changelog v2: - Resend. Link: https://lore.kernel.org/linux-mtd/20200526225849.20985-1-sergey.se...@baikalelectronics.ru Changelog v3: - Alphabetically order the include statements. - Discard dummy IOs. - Discard Baikal-T1 Boot ROM support. The directly mapped SPI flash memory will be used in the DW APB SSI driver instead. --- drivers/mtd/maps/Kconfig | 11 +++ drivers/mtd/maps/Makefile | 1 + drivers/mtd/maps/physmap-bt1-rom.c | 126 + drivers/mtd/maps/physmap-bt1-rom.h | 17 drivers/mtd/maps/physmap-core.c| 5 ++ 5 files changed, 160 insertions(+) create mode 100644 drivers/mtd/maps/physmap-bt1-rom.c create mode 100644 drivers/mtd/maps/physmap-bt1-rom.h diff --git a/drivers/mtd/maps/Kconfig b/drivers/mtd/maps/Kconfig index fd37553f1b07..6650acbc961e 100644 --- a/drivers/mtd/maps/Kconfig +++ b/drivers/mtd/maps/Kconfig @@ -75,6 +75,17 @@ config MTD_PHYSMAP_OF physically into the CPU's memory. The mapping description here is taken from OF device tree. +config MTD_PHYSMAP_BT1_ROM + bool "Baikal-T1 Boot ROMs OF-based physical memory map handling" + depends on MTD_PHYSMAP_OF + depends on MIPS_BAIKAL_T1 || COMPILE_TEST + select MTD_COMPLEX_MAPPINGS + select MULTIPLEXER + select MUX_MMIO + help + This provides some extra DT physmap parsing for the Baikal-T1 + platforms, some detection and setting up ROMs-specific accessors. + config MTD_PHYSMAP_VERSATILE bool "ARM Versatile OF-based physical memory map handling" depends on MTD_PHYSMAP_OF diff --git a/drivers/mtd/maps/Makefile b/drivers/mtd/maps/Makefile index c0da86a5d26f..79f018cf412f 100644 --- a/drivers/mtd/maps/Makefile +++ b/drivers/mtd/maps/Makefile @@ -18,6 +18,7 @@ obj-$(CONFIG_MTD_CK804XROM) += ck804xrom.o obj-$(CONFIG_MTD_TSUNAMI) += tsunami_flash.o obj-$(CONFIG_MTD_PXA2XX) += pxa2xx-flash.o physmap-objs-y += physmap-core.o +physmap-objs-$(CONFIG_MTD_PHYSMAP_BT1_ROM) += physmap-bt1-rom.o physmap-objs-$(CONFIG_MTD_PHYSMAP_VERSATILE) += physmap-versatile.o physmap-objs-$(CONFIG_MTD_PHYSMAP_GEMINI) += physmap-gemini.o physmap-objs-$(CONFIG_MTD_PHYSMAP_IXP4XX) += physmap-ixp4xx.o diff --git a/drivers/mtd/maps/physmap-bt1-rom.c b/drivers/mtd/maps/physmap-bt1-rom.c new file mode 100644 index ..27cfe1c63a2e --- /dev/null +++ b/drivers/mtd/maps/physmap-bt1-rom.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC + * + * Authors: + * Serge Semin + * + * Baikal-T1 Physically Mapped Internal ROM driver + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "physmap-bt1-rom.h" + +/* + * Baikal-T1 SoC ROMs are only accessible by the dword-aligned instructions. + * We have to take this into account when implementing the data read-methods. + * Note there is no need in bothering with endianness, since both Baikal-T1 + * CPU and MMIO are LE. + */ +static map_word __xipram bt1_rom_map_read(struct map_info *map, + unsigned long ofs) +{ + void __iomem *src = map->virt + ofs; + unsigned long shift; + map_word ret; + u32 data; + + /* Read data within offset dword. */ + shift = (unsigned long)src & 0x3; + data = readl_relaxed(src - shift); + if (!shift) { + ret.x[0] = data; + return ret; + } + ret.x[0] = data >> (shift * BITS_PER_BYTE); + + /* Read data from the next dword. */ + shift = 4 - shift; + if (ofs + shift >= map->size) + return ret; + + d
[PATCH] clk: baikal-t1: Mark Ethernet PLL as critical
We've discovered that disabling the so called Ethernet PLL causes reset of the devices consuming its outgoing clock. The resets happen automatically even if each underlying clock gate is turned off. Due to that we can't disable the Ethernet PLL until the kernel is prepared for the corresponding resets. So for now just mark the PLL clock provider as critical. Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: linux-m...@vger.kernel.org --- drivers/clk/baikal-t1/clk-ccu-pll.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/clk/baikal-t1/clk-ccu-pll.c b/drivers/clk/baikal-t1/clk-ccu-pll.c index 1eec8c0b8f50..2445d4b12baf 100644 --- a/drivers/clk/baikal-t1/clk-ccu-pll.c +++ b/drivers/clk/baikal-t1/clk-ccu-pll.c @@ -51,11 +51,13 @@ struct ccu_pll_info { }; /* - * Mark as critical all PLLs except Ethernet one. CPU and DDR PLLs are sources - * of CPU cores and DDR controller reference clocks, due to which they - * obviously shouldn't be ever gated. SATA and PCIe PLLs are the parents of - * APB-bus and DDR controller AXI-bus clocks. If they are gated the system will - * be unusable. + * Alas we have to mark all PLLs as critical. CPU and DDR PLLs are sources of + * CPU cores and DDR controller reference clocks, due to which they obviously + * shouldn't be ever gated. SATA and PCIe PLLs are the parents of APB-bus and + * DDR controller AXI-bus clocks. If they are gated the system will be + * unusable. Moreover disabling SATA and Ethernet PLLs causes automatic reset + * of the corresponding subsystems. So until we aren't ready to re-initialize + * all the devices consuming those PLLs, they will be marked as critical too. */ static const struct ccu_pll_info pll_info[] = { CCU_PLL_INFO(CCU_CPU_PLL, "cpu_pll", "ref_clk", CCU_CPU_PLL_BASE, @@ -67,7 +69,7 @@ static const struct ccu_pll_info pll_info[] = { CCU_PLL_INFO(CCU_PCIE_PLL, "pcie_pll", "ref_clk", CCU_PCIE_PLL_BASE, CLK_IS_CRITICAL), CCU_PLL_INFO(CCU_ETH_PLL, "eth_pll", "ref_clk", CCU_ETH_PLL_BASE, -CLK_SET_RATE_GATE) +CLK_IS_CRITICAL | CLK_SET_RATE_GATE) }; struct ccu_pll_data { -- 2.27.0
[PATCH 2/3] hwmon: bt1-pvt: Cache current update timeout
Instead of converting the update timeout data to the milliseconds each time on the read procedure let's preserve the currently set timeout in the dedicated driver private data cache. The cached value will be then used in the timeout read method and in the alarm-less data conversion to prevent the caller task hanging up in case if the PVT sensor is suddenly powered down. Fixes: 87976ce2825d ("hwmon: Add Baikal-T1 PVT sensor driver") Signed-off-by: Serge Semin --- drivers/hwmon/bt1-pvt.c | 85 ++--- drivers/hwmon/bt1-pvt.h | 3 ++ 2 files changed, 49 insertions(+), 39 deletions(-) diff --git a/drivers/hwmon/bt1-pvt.c b/drivers/hwmon/bt1-pvt.c index f4b7353c078a..2600426a3b21 100644 --- a/drivers/hwmon/bt1-pvt.c +++ b/drivers/hwmon/bt1-pvt.c @@ -655,44 +655,16 @@ static int pvt_write_trim(struct pvt_hwmon *pvt, long val) static int pvt_read_timeout(struct pvt_hwmon *pvt, long *val) { - unsigned long rate; - ktime_t kt; - u32 data; - - rate = clk_get_rate(pvt->clks[PVT_CLOCK_REF].clk); - if (!rate) - return -ENODEV; - - /* -* Don't bother with mutex here, since we just read data from MMIO. -* We also have to scale the ticks timeout up to compensate the -* ms-ns-data translations. -*/ - data = readl(pvt->regs + PVT_TTIMEOUT) + 1; + int ret; - /* -* Calculate ref-clock based delay (Ttotal) between two consecutive -* data samples of the same sensor. So we first must calculate the -* delay introduced by the internal ref-clock timer (Tref * Fclk). -* Then add the constant timeout cuased by each conversion latency -* (Tmin). The basic formulae for each conversion is following: -* Ttotal = Tref * Fclk + Tmin -* Note if alarms are enabled the sensors are polled one after -* another, so in order to have the delay being applicable for each -* sensor the requested value must be equally redistirbuted. -*/ -#if defined(CONFIG_SENSORS_BT1_PVT_ALARMS) - kt = ktime_set(PVT_SENSORS_NUM * (u64)data, 0); - kt = ktime_divns(kt, rate); - kt = ktime_add_ns(kt, PVT_SENSORS_NUM * PVT_TOUT_MIN); -#else - kt = ktime_set(data, 0); - kt = ktime_divns(kt, rate); - kt = ktime_add_ns(kt, PVT_TOUT_MIN); -#endif + ret = mutex_lock_interruptible(>iface_mtx); + if (ret) + return ret; /* Return the result in msec as hwmon sysfs interface requires. */ - *val = ktime_to_ms(kt); + *val = ktime_to_ms(pvt->timeout); + + mutex_unlock(>iface_mtx); return 0; } @@ -700,7 +672,7 @@ static int pvt_read_timeout(struct pvt_hwmon *pvt, long *val) static int pvt_write_timeout(struct pvt_hwmon *pvt, long val) { unsigned long rate; - ktime_t kt; + ktime_t kt, cache; u32 data; int ret; @@ -713,7 +685,7 @@ static int pvt_write_timeout(struct pvt_hwmon *pvt, long val) * between all available sensors to have the requested delay * applicable to each individual sensor. */ - kt = ms_to_ktime(val); + cache = kt = ms_to_ktime(val); #if defined(CONFIG_SENSORS_BT1_PVT_ALARMS) kt = ktime_divns(kt, PVT_SENSORS_NUM); #endif @@ -742,6 +714,7 @@ static int pvt_write_timeout(struct pvt_hwmon *pvt, long val) return ret; pvt_set_tout(pvt, data); + pvt->timeout = cache; mutex_unlock(>iface_mtx); @@ -1018,10 +991,17 @@ static int pvt_check_pwr(struct pvt_hwmon *pvt) return ret; } -static void pvt_init_iface(struct pvt_hwmon *pvt) +static int pvt_init_iface(struct pvt_hwmon *pvt) { + unsigned long rate; u32 trim, temp; + rate = clk_get_rate(pvt->clks[PVT_CLOCK_REF].clk); + if (!rate) { + dev_err(pvt->dev, "Invalid reference clock rate\n"); + return -ENODEV; + } + /* * Make sure all interrupts and controller are disabled so not to * accidentally have ISR executed before the driver data is fully @@ -1036,12 +1016,37 @@ static void pvt_init_iface(struct pvt_hwmon *pvt) pvt_set_mode(pvt, pvt_info[pvt->sensor].mode); pvt_set_tout(pvt, PVT_TOUT_DEF); + /* +* Preserve the current ref-clock based delay (Ttotal) between the +* sensors data samples in the driver data so not to recalculate it +* each time on the data requests and timeout reads. It consists of the +* delay introduced by the internal ref-clock timer (N / Fclk) and the +* constant timeout caused by each conversion latency (Tmin): +* Ttotal = N / Fclk + Tmin +* If alarms are enabled the sensors are polled one after another and +* in order to get the next measurement of a particular sensor the +* caller will have to wait for
[PATCH 1/3] hwmon: bt1-pvt: Test sensor power supply on probe
Baikal-T1 PVT sensor has got a dedicated power supply domain (feed up by the external GPVT/VPVT_18 pins). In case if it isn't powered up, the registers will be accessible, but the sensor conversion just won't happen. Due to that an attempt to read data from any PVT sensor will cause the task hanging up. For instance that will happen if XP11 jumper isn't installed on the Baikal-T1-based BFK3.1 board. Let's at least test whether the conversion work on the device probe procedure. By doing so will make sure that the PVT sensor is powered up at least at boot time. Fixes: 87976ce2825d ("hwmon: Add Baikal-T1 PVT sensor driver") Signed-off-by: Serge Semin --- drivers/hwmon/bt1-pvt.c | 40 1 file changed, 40 insertions(+) diff --git a/drivers/hwmon/bt1-pvt.c b/drivers/hwmon/bt1-pvt.c index 94698cae0497..f4b7353c078a 100644 --- a/drivers/hwmon/bt1-pvt.c +++ b/drivers/hwmon/bt1-pvt.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -982,6 +983,41 @@ static int pvt_request_clks(struct pvt_hwmon *pvt) return 0; } +static int pvt_check_pwr(struct pvt_hwmon *pvt) +{ + unsigned long tout; + int ret = 0; + u32 data; + + /* +* Test out the sensor conversion functionality. If it is not done on +* time then the domain must have been unpowered and we won't be able +* to use the device later in this driver. +* Note If the power source is lost during the normal driver work the +* data read procedure will either return -ETIMEDOUT (for the +* alarm-less driver configuration) or just stop the repeated +* conversion. In the later case alas we won't be able to detect the +* problem. +*/ + pvt_update(pvt->regs + PVT_INTR_MASK, PVT_INTR_ALL, PVT_INTR_ALL); + pvt_update(pvt->regs + PVT_CTRL, PVT_CTRL_EN, PVT_CTRL_EN); + pvt_set_tout(pvt, 0); + readl(pvt->regs + PVT_DATA); + + tout = PVT_TOUT_MIN / NSEC_PER_USEC; + usleep_range(tout, 2 * tout); + + data = readl(pvt->regs + PVT_DATA); + if (!(data & PVT_DATA_VALID)) { + ret = -ENODEV; + dev_err(pvt->dev, "Sensor is powered down\n"); + } + + pvt_update(pvt->regs + PVT_CTRL, PVT_CTRL_EN, 0); + + return ret; +} + static void pvt_init_iface(struct pvt_hwmon *pvt) { u32 trim, temp; @@ -1109,6 +1145,10 @@ static int pvt_probe(struct platform_device *pdev) if (ret) return ret; + ret = pvt_check_pwr(pvt); + if (ret) + return ret; + pvt_init_iface(pvt); ret = pvt_request_irq(pvt); -- 2.27.0
[PATCH 0/3] hwmon: bt1-pvt: Fix PVT sensor being unpowered
Baikal-T1 PVT sensor has got a dedicated power domain with 1.8V intended to be supplied via the external GPVT and VPVT_18 pins. In case if the power isn't supplied, the sensor IO registers will be accessible, but the data conversion just won't happen. The situation of the power loss currently will cause the temperature and voltage read procedures to hang up. The problem is fixed in the framework of this patchset firstly by checking whether the conversion works at the sensor probe procedure, and secondly by keeping the current conversion update timeout in the private driver data cache and using it to set the wait-completion timeout. Fixes: 87976ce2825d ("hwmon: Add Baikal-T1 PVT sensor driver") Signed-off-by: Serge Semin Cc: Maxim Kaurkin Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: linux-m...@vger.kernel.org Cc: linux-hw...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (3): hwmon: bt1-pvt: Test sensor power supply on probe hwmon: bt1-pvt: Cache current update timeout hwmon: bt1-pvt: Wait for the completion with timeout drivers/hwmon/bt1-pvt.c | 138 drivers/hwmon/bt1-pvt.h | 3 + 2 files changed, 101 insertions(+), 40 deletions(-) -- 2.27.0
[PATCH 3/3] hwmon: bt1-pvt: Wait for the completion with timeout
If the PVT sensor is suddenly powered down while a caller is waiting for the conversion completion, the request won't be finished and the task will hang up on this procedure until the power is back up again. Let's call the wait_for_completion_timeout() method instead to prevent that. The cached timeout is exactly what we need to predict for how long conversion could normally last. Fixes: 87976ce2825d ("hwmon: Add Baikal-T1 PVT sensor driver") Signed-off-by: Serge Semin --- drivers/hwmon/bt1-pvt.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/bt1-pvt.c b/drivers/hwmon/bt1-pvt.c index 2600426a3b21..3e1d56585b91 100644 --- a/drivers/hwmon/bt1-pvt.c +++ b/drivers/hwmon/bt1-pvt.c @@ -477,6 +477,7 @@ static int pvt_read_data(struct pvt_hwmon *pvt, enum pvt_sensor_type type, long *val) { struct pvt_cache *cache = >cache[type]; + unsigned long timeout; u32 data; int ret; @@ -500,7 +501,14 @@ static int pvt_read_data(struct pvt_hwmon *pvt, enum pvt_sensor_type type, pvt_update(pvt->regs + PVT_INTR_MASK, PVT_INTR_DVALID, 0); pvt_update(pvt->regs + PVT_CTRL, PVT_CTRL_EN, PVT_CTRL_EN); - wait_for_completion(>conversion); + /* +* Wait with timeout since in case if the sensor is suddenly powered +* down the request won't be completed and the caller will hang up on +* this procedure until the power is back up again. Multiply the +* timeout by the factor of two to prevent a false timeout. +*/ + timeout = 2 * usecs_to_jiffies(ktime_to_us(pvt->timeout)); + ret = wait_for_completion_timeout(>conversion, timeout); pvt_update(pvt->regs + PVT_CTRL, PVT_CTRL_EN, 0); pvt_update(pvt->regs + PVT_INTR_MASK, PVT_INTR_DVALID, @@ -510,6 +518,9 @@ static int pvt_read_data(struct pvt_hwmon *pvt, enum pvt_sensor_type type, mutex_unlock(>iface_mtx); + if (!ret) + return -ETIMEDOUT; + if (type == PVT_TEMP) *val = pvt_calc_poly(_N_to_temp, data); else -- 2.27.0
[PATCH 1/2] mips: Add strong UC ordering config
In accordance with [1, 2] memory transactions using CCA=2 (Uncached Cacheability and Coherency Attribute) are always strongly ordered. This means the younger memory accesses using CCA=2 are never allowed to be executed before older memory accesses using CCA=2 (no bypassing is allowed), and Loads and Stores using CCA=2 are never speculative. It is expected by the specification that the rest of the system maintains these properties for processor initiated uncached accesses. So the system IO interconnect doesn't reorder uncached transactions once they have left the processor subsystem. Taking into account these properties and what [3] says about the relaxed IO-accessors we can infer that normal Loads and Stores from/to CCA=2 memory and without any additional execution barriers will fully comply with the {read,write}X_relaxed() methods requirements. Let's convert then currently generated relaxed IO-accessors to being pure Loads and Stores. Seeing the commit 3d474dacae72 ("MIPS: Enforce strong ordering for MMIO accessors") and commit 8b656253a7a4 ("MIPS: Provide actually relaxed MMIO accessors") have already made a preparation in the corresponding macro, we can do that just by replacing the "barrier" parameter utilization with the "relax" one. Note the "barrier" macro argument can be removed, since it isn't fully used anyway other than being always assigned to 1. Of course it would be fullish to believe that all the available MIPS-based CPUs completely follow the denoted specification, especially considering how old the architecture is. Instead we introduced a dedicated kernel config, which when enabled will convert the relaxed IO-accessors to being pure Loads and Stores without any additional barriers around. So if some CPU supports the strongly ordered UC memory access, it can enable that config and use a fully optimized relaxed IO-methods. For instance, Baikal-T1 architecture support code will do that. [1] MIPS Coherence Protocol Specification, Document Number: MD00605, Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior, p. 33 [2] MIPS Coherence Protocol Specification, Document Number: MD00605, Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58 [3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt, Section "KERNEL I/O BARRIER EFFECTS" Signed-off-by: Serge Semin Cc: Maciej W. Rozycki --- arch/mips/Kconfig | 8 arch/mips/include/asm/io.h | 20 ++-- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index c95fa3a2484c..2c82d927347d 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -2066,6 +2066,14 @@ config WEAK_ORDERING # config WEAK_REORDERING_BEYOND_LLSC bool + +# +# CPU may not reorder reads and writes R->R, R->W, W->R, W->W within Uncached +# Cacheability and Coherency Attribute (CCA=2) +# +config STRONG_UC_ORDERING + bool + endmenu # diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h index 78537aa23500..130c4b6458fc 100644 --- a/arch/mips/include/asm/io.h +++ b/arch/mips/include/asm/io.h @@ -213,7 +213,7 @@ void iounmap(const volatile void __iomem *addr); #define war_io_reorder_wmb() barrier() #endif -#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, barrier, relax, irq)\ +#define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, relax, irq) \ \ static inline void pfx##write##bwlq(type val, \ volatile void __iomem *mem) \ @@ -221,7 +221,7 @@ static inline void pfx##write##bwlq(type val, \ volatile type *__mem; \ type __val; \ \ - if (barrier)\ + if (!(relax && IS_ENABLED(CONFIG_STRONG_UC_ORDERING))) \ iobarrier_rw(); \ else\ war_io_reorder_wmb(); \ @@ -262,7 +262,7 @@ static inline type pfx##read##bwlq(const volatile void __iomem *mem)\ \ __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));\ \ - if (barrier)\ + if (!(relax && IS_ENABLED(CONFIG_STRONG_UC_ORDERING))) \ iobarrier_rw(); \
[PATCH 2/2] mips: Introduce MIPS CM2 GCR Control register accessors
For some reason these accessors have been absent from the MIPS kernel, while some of them can be used to tune the MIPS code execution up (the default value are fully acceptable though). For instance, in the framework of MIPS P5600/P6600 (see [1] for details) if we are sure the IO interconnect doesn't reorder the requests we can freely set GCR_CONTROL.SYNCDIS, which will make CM2 to respond on SYNCs just after a request is accepted on the L2/Memory interface instead of executing the legacy SYNC and waiting for the full response from L2/Memory. Needless to say that this will significantly speed the {read,write}X() IO-accessors due to having more lightweight barriers around the IO Loads and Stores. There are others MIPS Coherency Manager optimizations available in framework of that register like cache ops serialization limits, speculative read enable, etc, which can be useful for the various MIPS platforms. [1] MIPS32 P5600 Multiprocessing System Software User's Manual, Document Number: MD01025, Revision 01.60, April 19, 2016, p. 400 Signed-off-by: Serge Semin --- Folks, do you think it would be better to implement a dedicated config for arch/mips/kernel/mips-cm.c code, which would disable the SI_SyncTxEn acceptance by setting the GCR_CONTROL.SYNCDIS bit? Currently I intend to set it in the out platform-specific prom_init() method. --- arch/mips/include/asm/mips-cm.h | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/mips/include/asm/mips-cm.h b/arch/mips/include/asm/mips-cm.h index aeae2effa123..17b2adf57e0c 100644 --- a/arch/mips/include/asm/mips-cm.h +++ b/arch/mips/include/asm/mips-cm.h @@ -143,6 +143,21 @@ GCR_ACCESSOR_RW(64, 0x008, base) #define CM_GCR_BASE_CMDEFTGT_IOCU02 #define CM_GCR_BASE_CMDEFTGT_IOCU13 +/* GCR_CONTROL - Global CM2 Settings */ +GCR_ACCESSOR_RW(64, 0x010, control) +#define CM_GCR_CONTROL_SYNCCTL BIT(16) +#define CM_GCR_CONTROL_SYNCDIS BIT(5) +#define CM_GCR_CONTROL_IVU_EN BIT(4) +#define CM_GCR_CONTROL_SHST_EN BIT(3) +#define CM_GCR_CONTROL_PARK_EN BIT(2) +#define CM_GCR_CONTROL_MMIO_LIMIT_DIS BIT(1) +#define CM_GCR_CONTROL_SPEC_READ_ENBIT(0) + +/* GCR_CONTROL2 - Global CM2 Settings (continue) */ +GCR_ACCESSOR_RW(64, 0x018, control2) +#define CM_GCR_CONTROL2_L2_CACHEOP_LIMIT GENMASK(19, 16) +#define CM_GCR_CONTROL2_L1_CACHEOP_LIMIT GENMASK(3, 0) + /* GCR_ACCESS - Controls core/IOCU access to GCRs */ GCR_ACCESSOR_RW(32, 0x020, access) #define CM_GCR_ACCESS_ACCESSEN GENMASK(7, 0) -- 2.27.0
[PATCH 0/2] mips: Introduce some IO-accessors optimizations
It has been discovered that on our MIPS P5600-based CPU the IO accessors aren't that rapid as they could be even taking into account a relatively slow AXI2APB bridge embedded into the system interconnect. Turned out we can introduce two types of optimizations. First we can remove the execution barriers from the relaxed IO-accessors as our CPU conforms to the MIPS Coherency Protocol Specification [1, 2]. Of course it also concerns the IO interconnect implementation. So in accordance with [3] we suggest to remove the barriers at least for the platforms which conform the specification the same way as ours. Second there is a dedicated Coherency Manager control register, which can be also used to tune the IO methods up. For some reason it hasn't been added to the MIPS arch code so far, while it provides flags for instance to speed the SYNC barrier for the platforms with non-re-ordering IO interconnect, to set the cache ops serialization limits, enable the speculative reads, etc. For now we suggest to add just the macro with the CM2 GCR_CONTROL register accessors and fields description. So any platform could use it to activate the corresponding optimization. Our platform-wise we'll do this in the framework of our Baikal-T1 platform code in the prom_init() method. [1] MIPS Coherence Protocol Specification, Document Number: MD00605, Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior, p. 33 [2] MIPS Coherence Protocol Specification, Document Number: MD00605, Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58 [3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt, Section "KERNEL I/O BARRIER EFFECTS" Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: Vadim Vlasov Cc: Maciej W. Rozycki Cc: linux-m...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (2): mips: Add strong UC ordering config mips: Introduce MIPS CM2 GCR Control register accessors arch/mips/Kconfig | 8 arch/mips/include/asm/io.h | 20 ++-- arch/mips/include/asm/mips-cm.h | 15 +++ 3 files changed, 33 insertions(+), 10 deletions(-) -- 2.27.0
Re: [PATCH] net: phy: realtek: fix rtl8211e rx/tx delay config
Hello Andrew. On Fri, Sep 18, 2020 at 03:54:03PM +0200, Andrew Lunn wrote: > On Fri, Sep 18, 2020 at 06:55:16AM +, 劉偉權 wrote: > > Hi Serge, > > > Thanks for your reply. There is a confidential issue that realtek > > doesn't offer the detail of a full register layout for configuration > > register. > > ... > > > >* 0xa4 extension page (0x7) layout. It can be used to disable/enable > > >* the RX/TX delays otherwise controlled by RXDLY/TXDLY pins. It can > > >* also be used to customize the whole configuration register: > > > > > - * 8:6 = PHY Address, 5:4 = Auto-Negotiation, 3 = Interface Mode Select, > > > - * 2 = RX Delay, 1 = TX Delay, 0 = SELRGV (see original PHY datasheet > > > - * for details). > > > + * 13 = Force Tx RX Delay controlled by bit12 bit11, > > > + * 12 = RX Delay, 11 = TX Delay > > > > > Here you've removed the register layout description and replaced itq > > with just three bits info. So from now the text above doesn't really > > corresponds to what follows. > > > I might have forgotten something, but AFAIR that register bits > > stateq mapped well to what was available on the corresponding > > external pins. > > Hi Willy > > So it appears bits 3 to 8 have been reverse engineered. Unless you > know from your confidential datasheet that these are wrong, please > leave the comment alone. > > If you confidential datasheet says that the usage of bits 0-2 is > wrong, then please do correct that part. I've got that info from Kyle post here: https://reviews.freebsd.org/D13591 My work with that problem has been done more than a year ago, so I don't remember all the details. But IIRC the very first nine bits [8:0] must be a copy of what is set on the external configuration pins as I described in the comment. AFAIR I tried to manually change these pins [1] and detected that change right there in the register. That fully fitted to what Kyle wrote in his post. Alas I can't repeat it to be completely sure at the moment due to the lack of the hardware. So I might have missed something, and Willy' confirmation about that would have been more than welcome. What we can say now for sure I didn't use the magic number in my patch. That could have been a mistake from what Willy says in the commit-log... Anyway aside with the Magic bits settings (which by Willy' patch appears to be the Tx/Rx delays + Force Tx/Rx delay setting) Kyle also clears the bits 1-2 with a comment "Ensure both internal delays are turned off". Willy, what can you say about that? Can we really leave them out from the change? Kyle, could you give us a comment about your experience with that? [1] RTL8211E-VB-CG, RTL8211E-VL-CG, RTL8211E-VL-CG, "INTEGRATED 10/100/1000M ETHERNET TRANSCEIVER" Datasheet, Table 13. Configuration Register Definition, Rev. 1.6, 03 April 2012, Track ID: JATR-3375-16, p.16 -Sergey > >Andrew
Re: [PATCH] net: phy: realtek: fix rtl8211e rx/tx delay config
Hello Willy, Thanks for the patch. My comments are below. I've Cc'ed the U-boot/FreeBSD, who might be also interested in the solution you've provided. On Thu, Sep 17, 2020 at 09:47:33AM +0800, Willy Liu wrote: > RGMII RX Delay and TX Delay settings will not applied if Force TX RX Delay > Control bit is not set. > Register bit for configuration pins: > 13 = force Tx RX Delay controlled by bit12 bit11 > 12 = Tx Delay > 11 = Rx Delay This is a very useful information, but it contradicts a bit to what knowledge we've currently got about that magical register. Current code in U-boot does the delays configuration by means of another bits: https://elixir.bootlin.com/u-boot/v2020.10-rc4/source/drivers/net/phy/realtek.c Could you provide a full register layout, so we'd know for sure what that register really does and finally close the question for good? > > Fixes: f81dadbcf7fd ("net: phy: realtek: Add rtl8211e rx/tx delays config") > Signed-off-by: Willy Liu > --- > drivers/net/phy/realtek.c | 20 ++-- > 1 file changed, 10 insertions(+), 10 deletions(-) > mode change 100644 => 100755 drivers/net/phy/realtek.c > > diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c > old mode 100644 > new mode 100755 > index 95dbe5e..3fddd57 > --- a/drivers/net/phy/realtek.c > +++ b/drivers/net/phy/realtek.c > @@ -32,9 +32,9 @@ > #define RTL8211F_TX_DELAYBIT(8) > #define RTL8211F_RX_DELAYBIT(3) > > -#define RTL8211E_TX_DELAYBIT(1) > -#define RTL8211E_RX_DELAYBIT(2) > -#define RTL8211E_MODE_MII_GMII BIT(3) > +#define RTL8211E_CTRL_DELAY BIT(13) > +#define RTL8211E_TX_DELAYBIT(12) > +#define RTL8211E_RX_DELAYBIT(11) So, what do BIT(1) and BIT(2) control then? Could you explain? > > #define RTL8201F_ISR 0x1e > #define RTL8201F_IER 0x13 > @@ -249,13 +249,13 @@ static int rtl8211e_config_init(struct phy_device > *phydev) > val = 0; > break; > case PHY_INTERFACE_MODE_RGMII_ID: > - val = RTL8211E_TX_DELAY | RTL8211E_RX_DELAY; > + val = RTL8211E_CTRL_DELAY | RTL8211E_TX_DELAY | > RTL8211E_RX_DELAY; > break; > case PHY_INTERFACE_MODE_RGMII_RXID: > - val = RTL8211E_RX_DELAY; > + val = RTL8211E_CTRL_DELAY | RTL8211E_RX_DELAY; > break; > case PHY_INTERFACE_MODE_RGMII_TXID: > - val = RTL8211E_TX_DELAY; > + val = RTL8211E_CTRL_DELAY | RTL8211E_TX_DELAY; > break; > default: /* the rest of the modes imply leaving delays as is. */ > return 0; > @@ -265,9 +265,8 @@ static int rtl8211e_config_init(struct phy_device *phydev) >* 0xa4 extension page (0x7) layout. It can be used to disable/enable >* the RX/TX delays otherwise controlled by RXDLY/TXDLY pins. It can >* also be used to customize the whole configuration register: > - * 8:6 = PHY Address, 5:4 = Auto-Negotiation, 3 = Interface Mode Select, > - * 2 = RX Delay, 1 = TX Delay, 0 = SELRGV (see original PHY datasheet > - * for details). > + * 13 = Force Tx RX Delay controlled by bit12 bit11, > + * 12 = RX Delay, 11 = TX Delay Here you've removed the register layout description and replaced it with just three bits info. So from now the text above doesn't really corresponds to what follows. I might have forgotten something, but AFAIR that register bits state mapped well to what was available on the corresponding external pins. So if you've got a sacred knowledge what configs are really hidden behind that register, please open it up. This in-code comment would be a good place to provide the full register description. -Sergey >*/ > oldpage = phy_select_page(phydev, 0x7); > if (oldpage < 0) > @@ -277,7 +276,8 @@ static int rtl8211e_config_init(struct phy_device *phydev) > if (ret) > goto err_restore_page; > > - ret = __phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, > + ret = __phy_modify(phydev, 0x1c, RTL8211E_CTRL_DELAY > +| RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, > val); > > err_restore_page: > -- > 1.9.1 >
Re: [PATCH] gpio: dwapb: add support for new hisilicon ascend soc
On Sun, Sep 06, 2020 at 06:18:07AM +, Dingtianhong wrote: [...] > > On Sat, Aug 22, 2020 at 12:27:53PM +0800, Ding Tianhong wrote: > >> The hisilicon ascend soc's gpio is based on the synopsys DW gpio, > >> and expand the register to support for INTCOMB_MASK, the new > >> register is used to enable/disable the interrupt combine features. > > > > I am wondering what does the "Interrupt Combine" feature do? Is it the same > > as > > the GPIO_INTR_IO pre-synthesize parameter of the DW_apb_gpio IP-core? Is it > > possible to tune the DW APB GPIO controller at runtime sending up to the IRQ > > controller either combined or individual interrupts? > > > > looks like the same. > > > If the later is true, then probably having the "Interrupt Combine" feature > > enabled must depend on whether an individual or combined interrupts are > > supplied > > in dts, etc... > > > > needed. > > > Could you explain the way the feature works and the corresponding layout > > register in more details? > > > > Ok > The hisilicon chip use the register called GPIO_INTCOMB_MASK (offset is > 0xffc) to enable the combien interrupt. > it is very simple, if GPIO_INTCOMB_MASK.bit0 is 0, then all combine > interrupte could not be used (default > setting), otherwise if 1, then the 32 ports could use the same irq line, that > is all. The main question is whether your hardware is capable of working with both combined and individual interrupts. Is your version of the DW APB GPIO controller able to generate both types of them? How is it connected to the parental interrupt controller? So If the GPIO and IRQ controllers are attached to each other with a single lane gpio_intr_flag, then indeed it's pure combined IRQ design and we'll have to make sure that GPIO_INTCOMB_MASK.bit0 is set to one before using the DW GPIO block. If they are also connected with each other by 32 individual GPIO-IRQ gpio_intr{_n}N lanes, then setting or clearing of the GPIO_INTCOMB_MASK.bit0 flag will for example depend on the number IRQ lanes specified in a dts file. In the later case the patch needs to be altered, but it would provide a better support of the hisilicon ascend soc's GPIO capabilities. -Sergey
Re: [PATCH] gpio: dwapb: add support for new hisilicon ascend soc
Hello Ding, Thanks for the patch. My comments are below. On Sat, Aug 22, 2020 at 12:27:53PM +0800, Ding Tianhong wrote: > The hisilicon ascend soc's gpio is based on the synopsys DW gpio, > and expand the register to support for INTCOMB_MASK, the new > register is used to enable/disable the interrupt combine features. I am wondering what does the "Interrupt Combine" feature do? Is it the same as the GPIO_INTR_IO pre-synthesize parameter of the DW_apb_gpio IP-core? Is it possible to tune the DW APB GPIO controller at runtime sending up to the IRQ controller either combined or individual interrupts? If the later is true, then probably having the "Interrupt Combine" feature enabled must depend on whether an individual or combined interrupts are supplied in dts, etc... Could you explain the way the feature works and the corresponding layout register in more details? > > Both support for ACPI and Device Tree. > > Signed-off-by: Ding Tianhong > --- > drivers/gpio/gpio-dwapb.c | 28 > 1 file changed, 28 insertions(+) > > diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c > index 1d8d55b..923b381 100644 > --- a/drivers/gpio/gpio-dwapb.c > +++ b/drivers/gpio/gpio-dwapb.c > @@ -49,6 +49,8 @@ > #define GPIO_EXT_PORTC 0x58 > #define GPIO_EXT_PORTD 0x5c > > +#define GPIO_INTCOMB_MASK0xffc If the register is the HiSilicon Ascend SoC specific, then I'd suggest to add the vendor-specific prefix like: GPIO_HS_INTCOMB_MASK. Also pay attention to the register naming. Here you define it as "INTCOMB", later as "INT_COMB". It's better to have the same references everywhere: either with underscore or without it. Also please move the new register definition to be after the corresponding feature macro definition (see the next comment for detailts). > + > #define DWAPB_DRIVER_NAME"gpio-dwapb" > #define DWAPB_MAX_PORTS 4 > > @@ -58,6 +60,10 @@ > > #define GPIO_REG_OFFSET_V2 1 > > +#define GPIO_REG_INT_COMB2 Please move this macro to be define after the "GPIO_PORTA_EOI_V2" one, so to make the blocked-like macro definitions order. Like this: #define GPIO_REG_OFFSET_V2 BIT(0) // Reg V2 Feature macro #define GPIO_INTMASK_V2 0x44// Reg V2-specific macro ... #define GPIO_PORTA_EOI_V2 0x40// Reg V2-specific macro + #define GPIO_REG_HS_INTCOMB BIT(1) // HiSilicon Feature macro + + #define GPIO_HS_INTCOMB_MASK 0xffc // HiSilicon-specific macro + I missed that in my series, but having the flags defined as decimals isn't good. Could you convert GPIO_REG_HS_INTCOMB and GPIO_REG_OFFSET_V2 to be defined as BIT(x)? The same comment about HS_-prefixing. Perhaps GPIO_REG_HS_INTCOMB? > +#define ENABLE_INT_COMB 1 > +#define DISABLE_INT_COMB 0 I don't really see a point in having these two macros defined. They are basically just the boolean flags. Please see my next comment. > + > #define GPIO_INTMASK_V2 0x44 > #define GPIO_INTTYPE_LEVEL_V20x34 > #define GPIO_INT_POLARITY_V2 0x38 > @@ -354,6 +360,20 @@ static irqreturn_t dwapb_irq_handler_mfd(int irq, void > *dev_id) > return IRQ_RETVAL(dwapb_do_irq(dev_id)); > } > > +static void dwapb_enable_inq_combine(struct dwapb_gpio *gpio, unsigned int > enable) inq_combine or int_combine? "enable" is used here as boolean, then it should be declared as one. > +{ > + u32 val; > + > + if (gpio->flags & GPIO_REG_INT_COMB) { > + val = dwapb_read(gpio, GPIO_INTCOMB_MASK); > + if (enable) > + val |= BIT(0); > + else > + val &= BIT(0); > + dwapb_write(gpio, GPIO_INTCOMB_MASK, val); > + } > +} > + Do you need to preserve the register content by using the read-modify-write procedure here? If not, then inlining something like this should be fine: if (gpio->flags & GPIO_REG_HS_INTCOMB) dwapb_write(gpio, GPIO_HS_INTCOMB_MASK, 1); > static void dwapb_configure_irqs(struct dwapb_gpio *gpio, >struct dwapb_gpio_port *port, >struct dwapb_port_property *pp) > @@ -446,6 +466,8 @@ static void dwapb_configure_irqs(struct dwapb_gpio *gpio, > irq_create_mapping(gpio->domain, hwirq); > > port->gc.to_irq = dwapb_gpio_to_irq; > + > + dwapb_enable_inq_combine(gpio, ENABLE_INT_COMB); > } > > static void dwapb_irq_teardown(struct dwapb_gpio *gpio) > @@ -618,6 +640,7 @@ static struct dwapb_platform_data > *dwapb_gpio_get_pdata(struct device *dev) > static const struct of_device_id dwapb_of_match[] = { > { .compatible = "snps,dw-apb-gpio", .data = (void *)0}, > { .compatible = "apm,xgene-gpio-v2", .data = (void > *)GPIO_REG_OFFSET_V2}, > + { .compatible = "hisi,ascend-gpio", .data = (void *)GPIO_REG_INT_COMB}, >
Re: [PATCH -next] bus: bt1-apb: remove duplicate include
On Wed, Aug 19, 2020 at 10:43:51AM +0800, Wang Hai wrote: > Remove linux/clk.h which is included more than once > > Reported-by: Hulk Robot > Signed-off-by: Wang Hai Thanks! Acked-by: Serge Semin > --- > drivers/bus/bt1-apb.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/bus/bt1-apb.c b/drivers/bus/bt1-apb.c > index b25ff941e7c7..74b1b712ef3a 100644 > --- a/drivers/bus/bt1-apb.c > +++ b/drivers/bus/bt1-apb.c > @@ -22,7 +22,6 @@ > #include > #include > #include > -#include > #include > > #define APB_EHB_ISR 0x00 > -- > 2.17.1 >
[PATCH v2 2/5] dmaengine: dw: Activate FIFO-mode for memory peripherals only
CFGx.FIFO_MODE field controls a DMA-controller "FIFO readiness" criterion. In other words it determines when to start pushing data out of a DW DMAC channel FIFO to a destination peripheral or from a source peripheral to the DW DMAC channel FIFO. Currently FIFO-mode is set to one for all DW DMAC channels. It means they are tuned to flush data out of FIFO (to a memory peripheral or by accepting the burst transaction requests) when FIFO is at least half-full (except at the end of the block transfer, when FIFO-flush mode is activated) and are configured to get data to the FIFO when it's at least half-empty. Such configuration is a good choice when there is no slave device involved in the DMA transfers. In that case the number of bursts per block is less than when CFGx.FIFO_MODE = 0 and, hence, the bus utilization will improve. But the latency of DMA transfers may increase when CFGx.FIFO_MODE = 1, since DW DMAC will wait for the channel FIFO contents to be either half-full or half-empty depending on having the destination or the source transfers. Such latencies might be dangerous in case if the DMA transfers are expected to be performed from/to a slave device. Since normally peripheral devices keep data in internal FIFOs, any latency at some critical moment may cause one being overflown and consequently losing data. This especially concerns a case when either a peripheral device is relatively fast or the DW DMAC engine is relatively slow with respect to the incoming data pace. In order to solve problems, which might be caused by the latencies described above, let's enable the FIFO half-full/half-empty "FIFO readiness" criterion only for DMA transfers with no slave device involved. Thanks to the commit 99ba8b9b0d97 ("dmaengine: dw: Initialize channel before each transfer") we can freely do that in the generic dw_dma_initialize_chan() method. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/dma/dw/dw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/dw/dw.c b/drivers/dma/dw/dw.c index 7a085b3c1854..d9810980920a 100644 --- a/drivers/dma/dw/dw.c +++ b/drivers/dma/dw/dw.c @@ -14,7 +14,7 @@ static void dw_dma_initialize_chan(struct dw_dma_chan *dwc) { struct dw_dma *dw = to_dw_dma(dwc->chan.device); - u32 cfghi = DWC_CFGH_FIFO_MODE; + u32 cfghi = is_slave_direction(dwc->direction) ? 0 : DWC_CFGH_FIFO_MODE; u32 cfglo = DWC_CFGL_CH_PRIOR(dwc->priority); bool hs_polarity = dwc->dws.hs_polarity; -- 2.27.0
[PATCH v2 0/5] dmaengine: dw: Introduce non-mem peripherals optimizations
After a lot of tests and thorough DW DMAC databook studying we've discovered that the driver can be optimized especially when it comes to working with non-memory peripherals. First of all we've found out that since each DW DMAC channel can be synthesized with different parameters, then even when two of them are configured to perform the same DMA transactions they may execute them with different performance. Since some DMA client devices might be sensitive to such important parameter as performance, then it is a good idea to let them request only suitable DMA channels. In this patchset we introduce a functionality, which makes it possible by passing the DMA channels mask either over the "dmas" DT property or in the dw_dma_slave platform data descriptor. Secondly FIFO-mode of the "FIFO readiness" criterion is more suitable for the pure memory DMA transfers, since it minimizes the system bus utilization, but causes some performance drop. When it comes to working with non-memory peripherals the DMA engine performance comes to the first place. Since normally DMA client devices keep data in internal FIFOs, any latency at some critical moment may cause a FIFO being overflown and consequently losing data. So in order to minimize a chance of the DW DMAC internal FIFO being a bottle neck during the DMA transfers to and from non-memory peripherals we propose not to use FIFO-mode for them. Thirdly it has been discovered that using a DMA transaction length is redundant when calculating the destination transfer width for the dev-to-mem DMA communications. That shall increase performance of the DMA transfers with unaligned data length. Finally there is a small optimization in the burst length setting. In particular we've found out, that according to the DW DMAC databoot it's pointless to set one for the memory peripherals since they don't have handshaking interface connected to the DMA controller. So we suggest to just ignore the burst length config when it comes to setting the memory peripherals up. Link: https://lore.kernel.org/dmaengine/20200730154545.3965-1-sergey.se...@baikalelectronics.ru Changelog v2: - Add Databook version to the commits log. - Use the statement "slave.channels >= BIT(dw->pdata->nr_channels)" to make sure the permitted DMA-channels pool is valid. - Describe new DW DMAC "channels" mask in a single line even though it gets out of 80 columns limit. Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: Peter Ujfalusi Cc: Andy Shevchenko Cc: Rob Herring Cc: dmaeng...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (5): dt-bindings: dma: dw: Add optional DMA-channels mask cell support dmaengine: dw: Activate FIFO-mode for memory peripherals only dmaengine: dw: Discard dlen from the dev-to-mem xfer width calculation dmaengine: dw: Ignore burst setting for memory peripherals dmaengine: dw: Add DMA-channels mask cell support .../devicetree/bindings/dma/snps,dma-spear1340.yaml| 7 +-- drivers/dma/dw/core.c | 6 +- drivers/dma/dw/dw.c| 7 +++ drivers/dma/dw/idma32.c| 5 ++--- drivers/dma/dw/of.c| 7 +-- include/linux/platform_data/dma-dw.h | 2 ++ 6 files changed, 22 insertions(+), 12 deletions(-) -- 2.27.0
[PATCH v2 3/5] dmaengine: dw: Discard dlen from the dev-to-mem xfer width calculation
Indeed in case of the DMA_DEV_TO_MEM DMA transfers it's enough to take the destination memory address and the destination master data width into account to calculate the CTLx.DST_TR_WIDTH setting of the memory peripheral. According to the DW DMAC IP-core Databook 2.18b (page 66, Example 5) at the and of a DMA transfer when the DMA-channel internal FIFO is left with data less than for a single destination burst transaction, the destination peripheral will enter the Single Transaction Region where the DW DMA controller can complete a block transfer to the destination using single transactions (non-burst transaction of CTLx.DST_TR_WIDTH bytes). If there is no enough data in the DMA-channel internal FIFO for even a single non-burst transaction of CTLx.DST_TR_WIDTH bytes, then the channel enters "FIFO flush mode". That mode is activated to empty the FIFO and flush the leftovers out to the memory peripheral. The flushing procedure is simple. The data is sent to the memory by means of a set of single transaction of CTLx.SRC_TR_WIDTH bytes. To sum up it's redundant to use the LLPs length to find out the CTLx.DST_TR_WIDTH parameter value, since each DMA transfer will be completed with the CTLx.SRC_TR_WIDTH bytes transaction if it is required. We suggest to remove the LLP entry length from the statement which calculates the memory peripheral DMA transaction width since it's redundant due to the feature described above. By doing so we'll improve the memory bus utilization and speed up the DMA-channel performance for DMA_DEV_TO_MEM DMA-transfers. Signed-off-by: Serge Semin Acked-by: Andy Shevchenko --- Changelog v2: - Add Databook version to the commit log. --- drivers/dma/dw/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 4700f2e87a62..3da0aea9fe25 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -723,7 +723,7 @@ dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl, lli_write(desc, sar, reg); lli_write(desc, dar, mem); lli_write(desc, ctlhi, ctlhi); - mem_width = __ffs(data_width | mem | dlen); + mem_width = __ffs(data_width | mem); lli_write(desc, ctllo, ctllo | DWC_CTLL_DST_WIDTH(mem_width)); desc->len = dlen; -- 2.27.0
[PATCH v2 4/5] dmaengine: dw: Ignore burst setting for memory peripherals
According to the DW DMA controller Databook 2.18b (page 40 "3.5 Memory Peripherals") memory peripherals don't have handshaking interface connected to the controller, therefore they can never be a flow controller. Since the CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are properties valid only for peripherals with a handshaking interface, we can freely zero these fields out if the memory peripheral is selected to be the source or the destination of the DMA transfers. Note according to the databook, length of burst transfers to memory is always equal to the number of data items available in a channel FIFO or data items required to complete the block transfer, whichever is smaller; length of burst transfers from memory is always equal to the space available in a channel FIFO or number of data items required to complete the block transfer, whichever is smaller. Signed-off-by: Serge Semin --- Changelog v2: - Add Databook version to the commit log. --- drivers/dma/dw/dw.c | 5 ++--- drivers/dma/dw/idma32.c | 5 ++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/dma/dw/dw.c b/drivers/dma/dw/dw.c index d9810980920a..a4862263ff14 100644 --- a/drivers/dma/dw/dw.c +++ b/drivers/dma/dw/dw.c @@ -67,9 +67,8 @@ static size_t dw_dma_block2bytes(struct dw_dma_chan *dwc, u32 block, u32 width) static u32 dw_dma_prepare_ctllo(struct dw_dma_chan *dwc) { struct dma_slave_config *sconfig = >dma_sconfig; - bool is_slave = is_slave_direction(dwc->direction); - u8 smsize = is_slave ? sconfig->src_maxburst : DW_DMA_MSIZE_16; - u8 dmsize = is_slave ? sconfig->dst_maxburst : DW_DMA_MSIZE_16; + u8 smsize = (dwc->direction == DMA_DEV_TO_MEM) ? sconfig->src_maxburst : 0; + u8 dmsize = (dwc->direction == DMA_MEM_TO_DEV) ? sconfig->dst_maxburst : 0; u8 p_master = dwc->dws.p_master; u8 m_master = dwc->dws.m_master; u8 dms = (dwc->direction == DMA_MEM_TO_DEV) ? p_master : m_master; diff --git a/drivers/dma/dw/idma32.c b/drivers/dma/dw/idma32.c index f00657308811..3ce44de25d33 100644 --- a/drivers/dma/dw/idma32.c +++ b/drivers/dma/dw/idma32.c @@ -73,9 +73,8 @@ static size_t idma32_block2bytes(struct dw_dma_chan *dwc, u32 block, u32 width) static u32 idma32_prepare_ctllo(struct dw_dma_chan *dwc) { struct dma_slave_config *sconfig = >dma_sconfig; - bool is_slave = is_slave_direction(dwc->direction); - u8 smsize = is_slave ? sconfig->src_maxburst : IDMA32_MSIZE_8; - u8 dmsize = is_slave ? sconfig->dst_maxburst : IDMA32_MSIZE_8; + u8 smsize = (dwc->direction == DMA_DEV_TO_MEM) ? sconfig->src_maxburst : 0; + u8 dmsize = (dwc->direction == DMA_MEM_TO_DEV) ? sconfig->dst_maxburst : 0; return DWC_CTLL_LLP_D_EN | DWC_CTLL_LLP_S_EN | DWC_CTLL_DST_MSIZE(dmsize) | DWC_CTLL_SRC_MSIZE(smsize); -- 2.27.0
[PATCH v2 5/5] dmaengine: dw: Add DMA-channels mask cell support
DW DMA IP-core provides a way to synthesize the DMA controller with channels having different parameters like maximum burst-length, multi-block support, maximum data width, etc. Those parameters both explicitly and implicitly affect the channels performance. Since DMA slave devices might be very demanding to the DMA performance, let's provide a functionality for the slaves to be assigned with DW DMA channels, which performance according to the platform engineer fulfill their requirements. After this patch is applied it can be done by passing the mask of suitable DMA-channels either directly in the dw_dma_slave structure instance or as a fifth cell of the DMA DT-property. If mask is zero or not provided, then there is no limitation on the channels allocation. For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first two channels are synthesized with max burst length of 16, while the rest of the channels have been created with max-burst-len=4. It would seem that the first two channels must be faster than the others and should be more preferable for the time-critical DMA slave devices. In practice it turned out that the situation is quite the opposite. The channels with max-burst-len=4 demonstrated a better performance than the channels with max-burst-len=16 even when they both had been initialized with the same settings. The performance drop of the first two DMA-channels made them unsuitable for the DW APB SSI slave device. No matter what settings they are configured with, full-duplex SPI transfers occasionally experience the Rx FIFO overflow. It means that the DMA-engine doesn't keep up with incoming data pace even though the SPI-bus is enabled with speed of 25MHz while the DW DMA controller is clocked with 50MHz signal. There is no such problem has been noticed for the channels synthesized with max-burst-len=4. Signed-off-by: Serge Semin --- Changelog v2: - Use the statement "slave.channels >= BIT(dw->pdata->nr_channels)" to make sure the permitted DMA-channels pool is valid. - Describe new DW DMAC "channels" mask in a single line even though it gets out of 80 columns limit. --- drivers/dma/dw/core.c| 4 drivers/dma/dw/of.c | 7 +-- include/linux/platform_data/dma-dw.h | 2 ++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 3da0aea9fe25..5f7b9badb965 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -772,6 +772,10 @@ bool dw_dma_filter(struct dma_chan *chan, void *param) if (dws->dma_dev != chan->device->dev) return false; + /* permit channels in accordance with the channels mask */ + if (dws->channels && !(dws->channels & dwc->mask)) + return false; + /* We have to copy data since dws can be temporary storage */ memcpy(>dws, dws, sizeof(struct dw_dma_slave)); diff --git a/drivers/dma/dw/of.c b/drivers/dma/dw/of.c index 1474b3817ef4..c1cf7675b9d1 100644 --- a/drivers/dma/dw/of.c +++ b/drivers/dma/dw/of.c @@ -22,18 +22,21 @@ static struct dma_chan *dw_dma_of_xlate(struct of_phandle_args *dma_spec, }; dma_cap_mask_t cap; - if (dma_spec->args_count != 3) + if (dma_spec->args_count < 3 || dma_spec->args_count > 4) return NULL; slave.src_id = dma_spec->args[0]; slave.dst_id = dma_spec->args[0]; slave.m_master = dma_spec->args[1]; slave.p_master = dma_spec->args[2]; + if (dma_spec->args_count >= 4) + slave.channels = dma_spec->args[3]; if (WARN_ON(slave.src_id >= DW_DMA_MAX_NR_REQUESTS || slave.dst_id >= DW_DMA_MAX_NR_REQUESTS || slave.m_master >= dw->pdata->nr_masters || - slave.p_master >= dw->pdata->nr_masters)) + slave.p_master >= dw->pdata->nr_masters || + slave.channels >= BIT(dw->pdata->nr_channels))) return NULL; dma_cap_zero(cap); diff --git a/include/linux/platform_data/dma-dw.h b/include/linux/platform_data/dma-dw.h index 4f681df85c27..4ca18e29c392 100644 --- a/include/linux/platform_data/dma-dw.h +++ b/include/linux/platform_data/dma-dw.h @@ -23,6 +23,7 @@ * @dst_id:dst request line * @m_master: memory master for transfers on allocated channel * @p_master: peripheral master for transfers on allocated channel + * @channels: mask of the channels permitted for allocation (zero value means any) * @hs_polarity:set active low polarity of handshake interface */ struct dw_dma_slave { @@ -31,6 +32,7 @@ struct dw_dma_slave { u8 dst_id; u8 m_master; u8 p_master; + u8 channels; boolhs_polarity; }; -- 2.27.0
[PATCH v2 1/5] dt-bindings: dma: dw: Add optional DMA-channels mask cell support
Each DW DMA controller channel can be synthesized with different parameters like maximum burst-length, multi-block support, maximum data width, etc. Most of these parameters determine the DW DMAC channels performance in its own aspect. On the other hand these parameters can be implicitly responsible for the channels performance degradation (for instance multi-block support is a very useful feature, but having it disabled during the DW DMAC synthesize will provide a more optimized core). Since DMA slave devices may have critical dependency on the DMA engine performance, let's provide a way for the slave devices to have the DMA-channels allocated from a pool of the channels, which according to the system engineer fulfill their performance requirements. The pool is determined by a mask optionally specified in the fifth DMA-cell of the DMA DT-property. If the fifth cell is omitted from the phandle arguments or the mask is zero, then the allocation will be performed from a set of all channels provided by the DMA controller. Signed-off-by: Serge Semin --- .../devicetree/bindings/dma/snps,dma-spear1340.yaml| 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml b/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml index 20870f5c14dd..ef1d6879c158 100644 --- a/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml +++ b/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml @@ -18,12 +18,15 @@ properties: const: snps,dma-spear1340 "#dma-cells": -const: 3 +minimum: 3 +maximum: 4 description: | First cell is a phandle pointing to the DMA controller. Second one is the DMA request line number. Third cell is the memory master identifier for transfers on dynamically allocated channel. Fourth cell is the - peripheral master identifier for transfers on an allocated channel. + peripheral master identifier for transfers on an allocated channel. Fifth + cell is an optional mask of the DMA channels permitted to be allocated + for the corresponding client device. reg: maxItems: 1 -- 2.27.0
Re: [PATCH 2/5] dmaengine: dw: Activate FIFO-mode for memory peripherals only
On Fri, Jul 31, 2020 at 10:22:54PM +0530, Vinod Koul wrote: > On 30-07-20, 20:13, Serge Semin wrote: > > On Thu, Jul 30, 2020 at 07:47:03PM +0300, Andy Shevchenko wrote: > > > On Thu, Jul 30, 2020 at 07:31:54PM +0300, Serge Semin wrote: > > > > On Thu, Jul 30, 2020 at 07:24:28PM +0300, Andy Shevchenko wrote: > > > > > On Thu, Jul 30, 2020 at 06:45:42PM +0300, Serge Semin wrote: > > > > > > ... > > > > > > > > > Thanks to the commit ("dmaengine: dw: Initialize > > > > > > channel > > > > > > ... > > > > > > > > > Note the DMA-engine repository > > > > > > git.infradead.org/users/vkoul/slave-dma.git > > > > > > isn't accessible. So I couldn't find out the Andy' commit hash to > > > > > > use it in > > > > > > the log. > > Yeah I moved tree to k.org after disk issue with infradead, change patch > was on dmaengine ML > > > > > > It's dmaengine.git on git.kernel.org. > > > > > > > > Ah, thanks! I've just found out that the repo address has been changed. > > > > But I've > > > > also scanned the "next" branch of the repo: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git > > > > > > > > Your commit isn't there. Am I missing something? > > > > > > > > It's a fix. It went to upstream branch (don't remember its name by heart > > > in > > > Vinod's repo). > > Yes it is Linus's tree now and in dmaengine you can find in fixes branch > > https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/commit/?h=fixes=99ba8b9b0d9780e9937eb1d488d120e9e5c2533d Thanks for pointing out to the commit. I'll add the hash to the patch log and resend the series shortly. -Sergey > > > > > Right. Found it. Thanks. > > > > -Sergey > > > > > > > > -- > > > With Best Regards, > > > Andy Shevchenko > > > > > > > > -- > ~Vinod
Re: clocksource: dw_apb_timer: commit 6d2e16a3181b broke Arria10 platform
On Fri, Jul 31, 2020 at 10:56:37AM -0500, Dinh Nguyen wrote: > Hi Serge, > > On 7/31/20 1:48 AM, Serge Semin wrote: > > Hello Dinh, > > It must be something wrong with your timer2 and timer3 declared in the > > Arria10 > > dts because the patch didn't change anything for the first two timers > > (timer0 and > > timer1). It just permits to register all DW APB Timers found in dts. > > > > If those timers are broken, then you should have disabled them in the dts > > in the > > first place. If they are normal, then you need to investigate further why do > > they cause the kernel panic. > > > > Indeed, the dts reg entry for timer3 was broken. Thanks alot for > exposing the issue. Apologies for the noise. No worries. Glad I could help. -Sergey > > Dinh
Re: [PATCH 0/8] spi: dw-dma: Add max SG entries burst capability support
On Fri, Jul 31, 2020 at 12:26:12PM +0300, Andy Shevchenko wrote: > On Fri, Jul 31, 2020 at 10:59:45AM +0300, Serge Semin wrote: > > ... > > > Note since the DMA-engine subsystem in kernel 5.8-rcX doesn't have the > > max_sg_burst capability supported, this series is intended to be applied > > only after the "next"-branch of the DMA-engine repository is merged into > > the mainline repo. Alternatively the series could be merged in through the > > DMA-engine repo. > > This needs to be thought through... There is nothing to think about: either Mark completes review/accepts the series and Vinod agrees to merge it in through the DMA-engine repo, or we'll have to wait until the next merge window is closed and then merge the series in traditionally through the SPI repo. -Sergey > > I gave some simple comments (and on top just try not to modify the same lines > inside one series two or more times, e.g. ret = func() -> return func() -> ret > = func() in one case). > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH 4/8] spi: dw-dma: Move DMA transfers submission to the channels prep methods
On Fri, Jul 31, 2020 at 12:15:28PM +0300, Andy Shevchenko wrote: > On Fri, Jul 31, 2020 at 10:59:49AM +0300, Serge Semin wrote: > > Indeed we can freely move the dmaengine_submit() method invocation and the > > Tx and Rx busy flag setting into the DMA Tx/Rx prepare methods. By doing > > so first we implement another preparation before adding the one-by-one DMA > > SG entries transmission, second we now have the dma_async_tx_descriptor > > descriptor used locally only in the new DMA transfers submitition methods, > > which makes the code less complex with no passing around the DMA Tx > > descriptors, third we make the generic transfer method more readable, where > > now the functionality of submission, execution and wait procedures is > > transparently split up instead of having a preparation, intermixed > > submission/execution and wait procedures. While at it we also add the > > dmaengine_submit() return value test. It has been unnecessary for > > DW DMAC, but should be done to support the generic DMA interface. > > > > Note since the DMA channels preparation methods are now responsible for > > the DMA transactions submission, we also rename them to > > dw_spi_dma_submit_{tx,rx}(). > > ... > > > + cookie = dmaengine_submit(txdesc); > > + ret = dma_submit_error(cookie); > > + if (!ret) > > Use traditional pattern > if (ret) > return ret; > > Same for below. Ok. > > > + set_bit(TX_BUSY, >dma_chan_busy); > > + > > + return ret; > > ... > > > - if (!xfer->rx_buf) > > - return NULL; > > This seems not related. I moved it to the upper level for the methods better maintainability. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH 3/8] spi: dw-dma: Configure the DMA channels in dma_setup
On Fri, Jul 31, 2020 at 12:16:38PM +0300, Andy Shevchenko wrote: > On Fri, Jul 31, 2020 at 10:59:48AM +0300, Serge Semin wrote: > > Mainly this is a preparation patch before adding one-by-one DMA SG entries > > transmission. But logically the Tx and Rx DMA channels setup should be > > performed in the dma_setup() callback anyway. So let's move the DMA slave > > channels src/dst burst lengths, address and address width configuration to > > the DMA setup stage. While at it make sure the return value of the > > dmaengine_slave_config() method is checked. It has been unnecessary in > > case if Dw DMAC is utilized as a DMA engine, since its device_config() > > callback always returns zero (though it might change in future). But since > > DW APB SSI driver now supports any DMA back-end we must make sure the > > DMA device configuration has been successful before proceeding with > > further setups. > > ... > Part 1: > > + if (!xfer->rx_buf) > > + return NULL; > > ... > Part 2: > > + if (xfer->rx_buf) { > > > + } > > This looks like a separate change to drop one of them and not hide in the > next patch. Both of these changes are a part of the single alteration introduced to detach two methods from each other: dw_spi_dma_{config,prepare}_{rx,tx}(). Part 1 is a statement, which belongs to the method dw_spi_dma_prepare_rx() and is left there after dw_spi_dma_config_rx() has been detached from it. Part 2 is a logical part, which must be presented in dw_spi_dma_setup() since we don't need to configure the Rx DMA channel if rx_buf isn't specified. Please, read more carefully the commit log. I didn't introduce anything other than the changes described there. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
[PATCH 2/8] spi: dw-dma: Fail DMA-based transfer if no Tx-buffer specified
Since commit 46164fde6b78 ("spi: dw: Fix Rx-only DMA transfers") if DMA interface is enabled, then Tx-buffer must be available in each SPI transfer. It's required since in order to activate the incoming data reception either DMA or CPU must be pushing data out to the SPI bus. But the DW APB SSI DMA driver code is still left in state as if Tx-buffer might be optional, which is no longer true. Let's fix it so an error would be returned if no Tx-buffer detected and DMA Tx would be always enabled. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 440679fa0764..ec721af61663 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -263,9 +263,6 @@ dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) struct dma_slave_config txconf; struct dma_async_tx_descriptor *txdesc; - if (!xfer->tx_buf) - return NULL; - memset(, 0, sizeof(txconf)); txconf.direction = DMA_MEM_TO_DEV; txconf.dst_addr = dws->dma_addr; @@ -384,17 +381,19 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) { - u16 imr = 0, dma_ctrl = 0; + u16 imr, dma_ctrl; - if (xfer->tx_buf) - dma_ctrl |= SPI_DMA_TDMAE; + if (!xfer->tx_buf) + return -EINVAL; + + /* Set the DMA handshaking interface */ + dma_ctrl = SPI_DMA_TDMAE; if (xfer->rx_buf) dma_ctrl |= SPI_DMA_RDMAE; dw_writel(dws, DW_SPI_DMACR, dma_ctrl); /* Set the interrupt mask */ - if (xfer->tx_buf) - imr |= SPI_INT_TXOI; + imr = SPI_INT_TXOI; if (xfer->rx_buf) imr |= SPI_INT_RXUI | SPI_INT_RXOI; spi_umask_intr(dws, imr); @@ -413,6 +412,8 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) /* Prepare the TX dma transfer */ txdesc = dw_spi_dma_prepare_tx(dws, xfer); + if (!txdesc) + return -EINVAL; /* Prepare the RX dma transfer */ rxdesc = dw_spi_dma_prepare_rx(dws, xfer); @@ -424,17 +425,15 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) dma_async_issue_pending(dws->rxchan); } - if (txdesc) { - set_bit(TX_BUSY, >dma_chan_busy); - dmaengine_submit(txdesc); - dma_async_issue_pending(dws->txchan); - } + set_bit(TX_BUSY, >dma_chan_busy); + dmaengine_submit(txdesc); + dma_async_issue_pending(dws->txchan); ret = dw_spi_dma_wait(dws, xfer); if (ret) return ret; - if (txdesc && dws->master->cur_msg->status == -EINPROGRESS) { + if (dws->master->cur_msg->status == -EINPROGRESS) { ret = dw_spi_dma_wait_tx_done(dws, xfer); if (ret) return ret; -- 2.27.0
[PATCH 4/8] spi: dw-dma: Move DMA transfers submission to the channels prep methods
Indeed we can freely move the dmaengine_submit() method invocation and the Tx and Rx busy flag setting into the DMA Tx/Rx prepare methods. By doing so first we implement another preparation before adding the one-by-one DMA SG entries transmission, second we now have the dma_async_tx_descriptor descriptor used locally only in the new DMA transfers submitition methods, which makes the code less complex with no passing around the DMA Tx descriptors, third we make the generic transfer method more readable, where now the functionality of submission, execution and wait procedures is transparently split up instead of having a preparation, intermixed submission/execution and wait procedures. While at it we also add the dmaengine_submit() return value test. It has been unnecessary for DW DMAC, but should be done to support the generic DMA interface. Note since the DMA channels preparation methods are now responsible for the DMA transactions submission, we also rename them to dw_spi_dma_submit_{tx,rx}(). Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 56 ++-- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 56496b659d62..366978086781 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -272,10 +272,11 @@ static int dw_spi_dma_config_tx(struct dw_spi *dws) return dmaengine_slave_config(dws->txchan, ); } -static struct dma_async_tx_descriptor * -dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *txdesc; + dma_cookie_t cookie; + int ret; txdesc = dmaengine_prep_slave_sg(dws->txchan, xfer->tx_sg.sgl, @@ -283,12 +284,17 @@ dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!txdesc) - return NULL; + return -ENOMEM; txdesc->callback = dw_spi_dma_tx_done; txdesc->callback_param = dws; - return txdesc; + cookie = dmaengine_submit(txdesc); + ret = dma_submit_error(cookie); + if (!ret) + set_bit(TX_BUSY, >dma_chan_busy); + + return ret; } static inline bool dw_spi_dma_rx_busy(struct dw_spi *dws) @@ -365,13 +371,11 @@ static int dw_spi_dma_config_rx(struct dw_spi *dws) return dmaengine_slave_config(dws->rxchan, ); } -static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, - struct spi_transfer *xfer) +static int dw_spi_dma_submit_rx(struct dw_spi *dws, struct spi_transfer *xfer) { struct dma_async_tx_descriptor *rxdesc; - - if (!xfer->rx_buf) - return NULL; + dma_cookie_t cookie; + int ret; rxdesc = dmaengine_prep_slave_sg(dws->rxchan, xfer->rx_sg.sgl, @@ -379,12 +383,17 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!rxdesc) - return NULL; + return -ENOMEM; rxdesc->callback = dw_spi_dma_rx_done; rxdesc->callback_param = dws; - return rxdesc; + cookie = dmaengine_submit(rxdesc); + ret = dma_submit_error(cookie); + if (!ret) + set_bit(RX_BUSY, >dma_chan_busy); + + return ret; } static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) @@ -427,26 +436,23 @@ static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) { - struct dma_async_tx_descriptor *txdesc, *rxdesc; int ret; - /* Prepare the TX dma transfer */ - txdesc = dw_spi_dma_prepare_tx(dws, xfer); - if (!txdesc) - return -EINVAL; + /* Submit DMA Tx transfer */ + ret = dw_spi_dma_submit_tx(dws, xfer); + if (ret) + return ret; - /* Prepare the RX dma transfer */ - rxdesc = dw_spi_dma_prepare_rx(dws, xfer); + /* Submit DMA Rx transfer if required */ + if (xfer->rx_buf) { + ret = dw_spi_dma_submit_rx(dws, xfer); + if (ret) + return ret; - /* rx must be started before tx due to spi instinct */ - if (rxdesc) { - set_bit(RX_BUSY, >dma_chan_busy); - dmaengine_submit(rxdesc); + /* rx must be started before tx due to spi instinct */ dma_async_issue_pending(dws->rxchan); } - set_bit(TX_BUSY,
[PATCH 7/8] spi: dw-dma: Pass exact data to the DMA submit and wait methods
In order to use the DMA submission and waiting methods in both generic DMA-based SPI transfer and one-by-one DMA SG entries transmission functions, we need to modify the dw_spi_dma_wait() and dw_spi_dma_submit_tx()/dw_spi_dma_submit_rx() prototypes. So instead of getting the SPI transfer object as the second argument they must accept the exact data structure instances they imply to use. Those are the current transfer length and the SPI bus frequency in case of dw_spi_dma_wait(), and SG list together with number of list entries in case of the DMA Tx/Rx submission methods. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 35 +-- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index f8508c0c7978..2b42b42b6cf2 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -189,12 +189,12 @@ static enum dma_slave_buswidth dw_spi_dma_convert_width(u8 n_bytes) return DMA_SLAVE_BUSWIDTH_UNDEFINED; } -static int dw_spi_dma_wait(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_wait(struct dw_spi *dws, unsigned int len, u32 speed) { unsigned long long ms; - ms = xfer->len * MSEC_PER_SEC * BITS_PER_BYTE; - do_div(ms, xfer->effective_speed_hz); + ms = len * MSEC_PER_SEC * BITS_PER_BYTE; + do_div(ms, speed); ms += ms + 200; if (ms > UINT_MAX) @@ -269,17 +269,16 @@ static int dw_spi_dma_config_tx(struct dw_spi *dws) return dmaengine_slave_config(dws->txchan, ); } -static int dw_spi_dma_submit_tx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_submit_tx(struct dw_spi *dws, struct scatterlist *sgl, + unsigned int nents) { struct dma_async_tx_descriptor *txdesc; dma_cookie_t cookie; int ret; - txdesc = dmaengine_prep_slave_sg(dws->txchan, - xfer->tx_sg.sgl, - xfer->tx_sg.nents, - DMA_MEM_TO_DEV, - DMA_PREP_INTERRUPT | DMA_CTRL_ACK); + txdesc = dmaengine_prep_slave_sg(dws->txchan, sgl, nents, +DMA_MEM_TO_DEV, +DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!txdesc) return -ENOMEM; @@ -367,17 +366,16 @@ static int dw_spi_dma_config_rx(struct dw_spi *dws) return dmaengine_slave_config(dws->rxchan, ); } -static int dw_spi_dma_submit_rx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_submit_rx(struct dw_spi *dws, struct scatterlist *sgl, + unsigned int nents) { struct dma_async_tx_descriptor *rxdesc; dma_cookie_t cookie; int ret; - rxdesc = dmaengine_prep_slave_sg(dws->rxchan, - xfer->rx_sg.sgl, - xfer->rx_sg.nents, - DMA_DEV_TO_MEM, - DMA_PREP_INTERRUPT | DMA_CTRL_ACK); + rxdesc = dmaengine_prep_slave_sg(dws->rxchan, sgl, nents, +DMA_DEV_TO_MEM, +DMA_PREP_INTERRUPT | DMA_CTRL_ACK); if (!rxdesc) return -ENOMEM; @@ -436,13 +434,14 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, int ret; /* Submit DMA Tx transfer */ - ret = dw_spi_dma_submit_tx(dws, xfer); + ret = dw_spi_dma_submit_tx(dws, xfer->tx_sg.sgl, xfer->tx_sg.nents); if (ret) goto err_clear_dmac; /* Submit DMA Rx transfer if required */ if (xfer->rx_buf) { - ret = dw_spi_dma_submit_rx(dws, xfer); + ret = dw_spi_dma_submit_rx(dws, xfer->rx_sg.sgl, + xfer->rx_sg.nents); if (ret) goto err_clear_dmac; @@ -452,7 +451,7 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, dma_async_issue_pending(dws->txchan); - ret = dw_spi_dma_wait(dws, xfer); + ret = dw_spi_dma_wait(dws, xfer->len, xfer->effective_speed_hz); err_clear_dmac: dw_writel(dws, DW_SPI_DMACR, 0); -- 2.27.0
[PATCH 6/8] spi: dw-dma: Move DMAC register cleanup to DMA transfer method
DW APB SSI DMA driver doesn't use the native SPI core wait API since commit bdbdf0f06337 ("spi: dw: Locally wait for the DMA transfers completion"). Due to that the driver can now clear the DMAC register in a single place synchronously with the DMA transactions completion or failure. After that all the possible code paths are still covered: 1) DMA completion callbacks are executed in case if the corresponding DMA transactions are finished. When they are, one of them will eventually wake the SPI messages pump kernel thread and dw_spi_dma_transfer_all() method will clean the DMAC register as implied by this patch. 2) dma_stop is called when the SPI core detects an error either returned from the transfer_one() callback or set in the SPI message status field. Both types of errors will be noticed by the dw_spi_dma_transfer_all() method. 3) dma_exit is called when either SPI controller driver or the corresponding device is removed. In any case the SPI core will first flush the SPI messages pump kernel thread, so any pending or in-fly SPI transfers will be finished before that. Due to all of that let's simplify the DW APB SSI DMA driver a bit and move the DMAC register cleanup to a single place in the dw_spi_dma_transfer_all() method. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 32ef7913a73d..f8508c0c7978 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -153,8 +153,6 @@ static void dw_spi_dma_exit(struct dw_spi *dws) dmaengine_terminate_sync(dws->rxchan); dma_release_channel(dws->rxchan); } - - dw_writel(dws, DW_SPI_DMACR, 0); } static irqreturn_t dw_spi_dma_transfer_handler(struct dw_spi *dws) @@ -253,7 +251,6 @@ static void dw_spi_dma_tx_done(void *arg) if (test_bit(RX_BUSY, >dma_chan_busy)) return; - dw_writel(dws, DW_SPI_DMACR, 0); complete(>dma_completion); } @@ -352,7 +349,6 @@ static void dw_spi_dma_rx_done(void *arg) if (test_bit(TX_BUSY, >dma_chan_busy)) return; - dw_writel(dws, DW_SPI_DMACR, 0); complete(>dma_completion); } @@ -442,13 +438,13 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, /* Submit DMA Tx transfer */ ret = dw_spi_dma_submit_tx(dws, xfer); if (ret) - return ret; + goto err_clear_dmac; /* Submit DMA Rx transfer if required */ if (xfer->rx_buf) { ret = dw_spi_dma_submit_rx(dws, xfer); if (ret) - return ret; + goto err_clear_dmac; /* rx must be started before tx due to spi instinct */ dma_async_issue_pending(dws->rxchan); @@ -456,7 +452,12 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, dma_async_issue_pending(dws->txchan); - return dw_spi_dma_wait(dws, xfer); + ret = dw_spi_dma_wait(dws, xfer); + +err_clear_dmac: + dw_writel(dws, DW_SPI_DMACR, 0); + + return ret; } static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) @@ -489,8 +490,6 @@ static void dw_spi_dma_stop(struct dw_spi *dws) dmaengine_terminate_sync(dws->rxchan); clear_bit(RX_BUSY, >dma_chan_busy); } - - dw_writel(dws, DW_SPI_DMACR, 0); } static const struct dw_spi_dma_ops dw_spi_dma_mfld_ops = { -- 2.27.0
[PATCH 8/8] spi: dw-dma: Add one-by-one SG list entries transfer
In case if at least one of the requested DMA engine channels doesn't support the hardware accelerated SG list entries traverse, the DMA driver will most likely work that around by performing the IRQ-based SG list entries resubmission. That might and will cause a problem if the DMA Tx channel is recharged and re-executed before the Rx DMA channel. Due to non-deterministic IRQ-handler execution latency the DMA Tx channel will start pushing data to the SPI bus before the Rx DMA channel is even reinitialized with the next inbound SG list entry. By doing so the DMA Tx channel will implicitly start filling the DW APB SSI Rx FIFO up, which while the DMA Rx channel being recharged and re-executed will eventually be overflown. In order to solve the problem we have to feed the DMA engine with SG list entries one-by-one. It shall keep the DW APB SSI Tx and Rx FIFOs synchronized and prevent the Rx FIFO overflow. Since in general the SPI tx_sg and rx_sg lists may have different number of entries of different lengths (though total length should match) we virtually split the SG-lists to the set of DMA transfers, which length is a minimum of the ordered SG-entries lengths. The solution described above is only executed if a full-duplex SPI transfer is requested and the DMA engine hasn't provided channels with hardware accelerated SG list traverse capability to handle both SG lists at once. Signed-off-by: Serge Semin Suggested-by: Andy Shevchenko --- drivers/spi/spi-dw-dma.c | 137 ++- drivers/spi/spi-dw.h | 1 + 2 files changed, 137 insertions(+), 1 deletion(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 2b42b42b6cf2..b3eeef3d5cf7 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -73,6 +73,23 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws) dw_writel(dws, DW_SPI_DMATDLR, dws->txburst); } +static void dw_spi_dma_sg_burst_init(struct dw_spi *dws) +{ + struct dma_slave_caps tx = {0}, rx = {0}; + + dma_get_slave_caps(dws->txchan, ); + dma_get_slave_caps(dws->rxchan, ); + + if (tx.max_sg_burst > 0 && rx.max_sg_burst > 0) + dws->dma_sg_burst = min(tx.max_sg_burst, rx.max_sg_burst); + else if (tx.max_sg_burst > 0) + dws->dma_sg_burst = tx.max_sg_burst; + else if (rx.max_sg_burst > 0) + dws->dma_sg_burst = rx.max_sg_burst; + else + dws->dma_sg_burst = 0; +} + static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws) { struct dw_dma_slave dma_tx = { .dst_id = 1 }, *tx = _tx; @@ -110,6 +127,8 @@ static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws) dw_spi_dma_maxburst_init(dws); + dw_spi_dma_sg_burst_init(dws); + return 0; free_rxchan: @@ -139,6 +158,8 @@ static int dw_spi_dma_init_generic(struct device *dev, struct dw_spi *dws) dw_spi_dma_maxburst_init(dws); + dw_spi_dma_sg_burst_init(dws); + return 0; } @@ -459,11 +480,125 @@ static int dw_spi_dma_transfer_all(struct dw_spi *dws, return ret; } +static int dw_spi_dma_transfer_one(struct dw_spi *dws, + struct spi_transfer *xfer) +{ + struct scatterlist *tx_sg = NULL, *rx_sg = NULL, tx_tmp, rx_tmp; + unsigned int tx_len = 0, rx_len = 0; + unsigned int base, len; + int ret; + + sg_init_table(_tmp, 1); + sg_init_table(_tmp, 1); + + /* +* In case if at least one of the requested DMA channels doesn't +* support the hardware accelerated SG list entries traverse, the DMA +* driver will most likely work that around by performing the IRQ-based +* SG list entries resubmission. That might and will cause a problem +* if the DMA Tx channel is recharged and re-executed before the Rx DMA +* channel. Due to non-deterministic IRQ-handler execution latency the +* DMA Tx channel will start pushing data to the SPI bus before the +* Rx DMA channel is even reinitialized with the next inbound SG list +* entry. By doing so the DMA Tx channel will implicitly start filling +* the DW APB SSI Rx FIFO up, which while the DMA Rx channel being +* recharged and re-executed will eventually be overflown. +* +* In order to solve the problem we have to feed the DMA engine with SG +* list entries one-by-one. It shall keep the DW APB SSI Tx and Rx +* FIFOs synchronized and prevent the Rx FIFO overflow. Since in +* general the tx_sg and rx_sg lists may have different number of +* entries of different lengths (though total length should match) +* let's virtually split the SG-lists to the set of DMA transfers, +* which length is a minimum of the ordered SG-entries lengths. +* An ASCII-sketch of the implemented algo is followi
[PATCH 5/8] spi: dw-dma: Detach DMA transfer into a dedicated method
In order to add an alternative method of DMA-based SPI transfer first we need to detach the currently available one from the common code. Here we move the normal DMA-based SPI transfer execution functionality into a dedicated method. It will be utilized if either the DMA engine supports an unlimited number SG entries or Tx-only SPI transfer is requested. But currently just use it for any SPI transfer. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index 366978086781..32ef7913a73d 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -434,7 +434,8 @@ static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) return 0; } -static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_transfer_all(struct dw_spi *dws, + struct spi_transfer *xfer) { int ret; @@ -455,7 +456,14 @@ static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) dma_async_issue_pending(dws->txchan); - ret = dw_spi_dma_wait(dws, xfer); + return dw_spi_dma_wait(dws, xfer); +} + +static int dw_spi_dma_transfer(struct dw_spi *dws, struct spi_transfer *xfer) +{ + int ret; + + ret = dw_spi_dma_transfer_all(dws, xfer); if (ret) return ret; -- 2.27.0
[PATCH 1/8] spi: dw-dma: Set DMA Level registers on init
Indeed the registers content doesn't get cleared when the SPI controller is disabled and enabled. Max burst lengths aren't changed since the Rx and Tx DMA channels are requested on init stage and are kept acquired until the device is removed. Obviously SPI controller FIFO depth can't be changed. Due to all of that we can safely move the DMA Transmit and Receive data level registers initialization to the SPI controller DMA init stage (when the SPI controller is being probed) instead of doing it for each SPI transfer when dma_setup is called. This shall speed the DMA-based SPI transfer initialization up a bit, particularly if the APB bus is relatively slow. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 29 ++--- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index bb390ff67d1d..440679fa0764 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -49,6 +49,7 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws) max_burst = RX_BURST_LEVEL; dws->rxburst = min(max_burst, def_burst); + dw_writel(dws, DW_SPI_DMARDLR, dws->rxburst - 1); ret = dma_get_slave_caps(dws->txchan, ); if (!ret && caps.max_burst) @@ -56,7 +57,20 @@ static void dw_spi_dma_maxburst_init(struct dw_spi *dws) else max_burst = TX_BURST_LEVEL; + /* +* Having a Rx DMA channel serviced with higher priority than a Tx DMA +* channel might not be enough to provide a well balanced DMA-based +* SPI transfer interface. There might still be moments when the Tx DMA +* channel is occasionally handled faster than the Rx DMA channel. +* That in its turn will eventually cause the SPI Rx FIFO overflow if +* SPI bus speed is high enough to fill the SPI Rx FIFO in before it's +* cleared by the Rx DMA channel. In order to fix the problem the Tx +* DMA activity is intentionally slowed down by limiting the SPI Tx +* FIFO depth with a value twice bigger than the Tx burst length +* calculated earlier by the dw_spi_dma_maxburst_init() method. +*/ dws->txburst = min(max_burst, def_burst); + dw_writel(dws, DW_SPI_DMATDLR, dws->txburst); } static int dw_spi_dma_init_mfld(struct device *dev, struct dw_spi *dws) @@ -372,21 +386,6 @@ static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) { u16 imr = 0, dma_ctrl = 0; - /* -* Having a Rx DMA channel serviced with higher priority than a Tx DMA -* channel might not be enough to provide a well balanced DMA-based -* SPI transfer interface. There might still be moments when the Tx DMA -* channel is occasionally handled faster than the Rx DMA channel. -* That in its turn will eventually cause the SPI Rx FIFO overflow if -* SPI bus speed is high enough to fill the SPI Rx FIFO in before it's -* cleared by the Rx DMA channel. In order to fix the problem the Tx -* DMA activity is intentionally slowed down by limiting the SPI Tx -* FIFO depth with a value twice bigger than the Tx burst length -* calculated earlier by the dw_spi_dma_maxburst_init() method. -*/ - dw_writel(dws, DW_SPI_DMARDLR, dws->rxburst - 1); - dw_writel(dws, DW_SPI_DMATDLR, dws->txburst); - if (xfer->tx_buf) dma_ctrl |= SPI_DMA_TDMAE; if (xfer->rx_buf) -- 2.27.0
[PATCH 3/8] spi: dw-dma: Configure the DMA channels in dma_setup
Mainly this is a preparation patch before adding one-by-one DMA SG entries transmission. But logically the Tx and Rx DMA channels setup should be performed in the dma_setup() callback anyway. So let's move the DMA slave channels src/dst burst lengths, address and address width configuration to the DMA setup stage. While at it make sure the return value of the dmaengine_slave_config() method is checked. It has been unnecessary in case if Dw DMAC is utilized as a DMA engine, since its device_config() callback always returns zero (though it might change in future). But since DW APB SSI driver now supports any DMA back-end we must make sure the DMA device configuration has been successful before proceeding with further setups. Signed-off-by: Serge Semin --- drivers/spi/spi-dw-dma.c | 42 +--- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/drivers/spi/spi-dw-dma.c b/drivers/spi/spi-dw-dma.c index ec721af61663..56496b659d62 100644 --- a/drivers/spi/spi-dw-dma.c +++ b/drivers/spi/spi-dw-dma.c @@ -257,11 +257,9 @@ static void dw_spi_dma_tx_done(void *arg) complete(>dma_completion); } -static struct dma_async_tx_descriptor * -dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) +static int dw_spi_dma_config_tx(struct dw_spi *dws) { struct dma_slave_config txconf; - struct dma_async_tx_descriptor *txdesc; memset(, 0, sizeof(txconf)); txconf.direction = DMA_MEM_TO_DEV; @@ -271,7 +269,13 @@ dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) txconf.dst_addr_width = dw_spi_dma_convert_width(dws->n_bytes); txconf.device_fc = false; - dmaengine_slave_config(dws->txchan, ); + return dmaengine_slave_config(dws->txchan, ); +} + +static struct dma_async_tx_descriptor * +dw_spi_dma_prepare_tx(struct dw_spi *dws, struct spi_transfer *xfer) +{ + struct dma_async_tx_descriptor *txdesc; txdesc = dmaengine_prep_slave_sg(dws->txchan, xfer->tx_sg.sgl, @@ -346,14 +350,9 @@ static void dw_spi_dma_rx_done(void *arg) complete(>dma_completion); } -static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, - struct spi_transfer *xfer) +static int dw_spi_dma_config_rx(struct dw_spi *dws) { struct dma_slave_config rxconf; - struct dma_async_tx_descriptor *rxdesc; - - if (!xfer->rx_buf) - return NULL; memset(, 0, sizeof(rxconf)); rxconf.direction = DMA_DEV_TO_MEM; @@ -363,7 +362,16 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, rxconf.src_addr_width = dw_spi_dma_convert_width(dws->n_bytes); rxconf.device_fc = false; - dmaengine_slave_config(dws->rxchan, ); + return dmaengine_slave_config(dws->rxchan, ); +} + +static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, + struct spi_transfer *xfer) +{ + struct dma_async_tx_descriptor *rxdesc; + + if (!xfer->rx_buf) + return NULL; rxdesc = dmaengine_prep_slave_sg(dws->rxchan, xfer->rx_sg.sgl, @@ -382,10 +390,22 @@ static struct dma_async_tx_descriptor *dw_spi_dma_prepare_rx(struct dw_spi *dws, static int dw_spi_dma_setup(struct dw_spi *dws, struct spi_transfer *xfer) { u16 imr, dma_ctrl; + int ret; if (!xfer->tx_buf) return -EINVAL; + /* Setup DMA channels */ + ret = dw_spi_dma_config_tx(dws); + if (ret) + return ret; + + if (xfer->rx_buf) { + ret = dw_spi_dma_config_rx(dws); + if (ret) + return ret; + } + /* Set the DMA handshaking interface */ dma_ctrl = SPI_DMA_TDMAE; if (xfer->rx_buf) -- 2.27.0
[PATCH 0/8] spi: dw-dma: Add max SG entries burst capability support
SPI transfer SG-lists in a single DMA transaction without software intervention, and for the full-duplex SPI-transfers. Note since the DMA-engine subsystem in kernel 5.8-rcX doesn't have the max_sg_burst capability supported, this series is intended to be applied only after the "next"-branch of the DMA-engine repository is merged into the mainline repo. Alternatively the series could be merged in through the DMA-engine repo. Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Georgy Vlasov Cc: Ramil Zaripov Cc: Pavel Parkhomenko Cc: Peter Ujfalusi Cc: Andy Shevchenko Cc: Andy Shevchenko Cc: Feng Tang Cc: Vinod Koul Cc: linux-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (8): spi: dw-dma: Set DMA Level registers on init spi: dw-dma: Fail DMA-based transfer if no Tx-buffer specified spi: dw-dma: Configure the DMA channels in dma_setup spi: dw-dma: Move DMA transfers submission to the channels prep methods spi: dw-dma: Detach DMA transfer into a dedicated method spi: dw-dma: Move DMAC register cleanup to DMA transfer method spi: dw-dma: Pass exact data to the DMA submit and wait methods spi: dw-dma: Add one-by-one SG list entries transfer drivers/spi/spi-dw-dma.c | 309 ++- drivers/spi/spi-dw.h | 1 + 2 files changed, 238 insertions(+), 72 deletions(-) -- 2.27.0
Re: clocksource: dw_apb_timer: commit 6d2e16a3181b broke Arria10 platform
Hello Dinh, It must be something wrong with your timer2 and timer3 declared in the Arria10 dts because the patch didn't change anything for the first two timers (timer0 and timer1). It just permits to register all DW APB Timers found in dts. If those timers are broken, then you should have disabled them in the dts in the first place. If they are normal, then you need to investigate further why do they cause the kernel panic. -Sergey On Thu, Jul 30, 2020 at 09:30:55PM +, Nguyen, Dinh wrote: > Hi Sergey, > > Commit "6d2e16a3181b clocksource: dw_apb_timer_of: Fix missing clockevent > timers" broke the Arria10 platform. See the bootlog here: > > [0.00] Booting Linux on physical CPU 0x0 > [0.00] Linux version 5.8.0-rc7-next-20200730 (dinguyen@linux-builds1) > (a > rm-linux-gnueabihf-gcc (Linaro GCC 7.2-2017.11) 7.2.1 20171011, GNU ld > (Linaro_B > inutils-2017.11) 2.28.2.20170706) #18 SMP Thu Jul 30 16:25:46 CDT 2020 > [0.00] CPU: ARMv7 Processor [414fc091] revision 1 (ARMv7), cr=10c5387d > [0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing > instructio > n cache > [0.00] OF: fdt: Machine model: Altera SOCFPGA Arria 10 > [0.00] earlycon: uart0 at MMIO32 0xffc02100 (options '115200n8') > [0.00] printk: bootconsole [uart0] enabled > [0.00] Memory policy: Data cache writealloc > [0.00] Zone ranges: > [0.00] Normal [mem 0x-0x2fff] > [0.00] HighMem [mem 0x3000-0x3fff] > [0.00] Movable zone start for each node > [0.00] Early memory node ranges > [0.00] node 0: [mem 0x-0x3fff] > [0.00] Initmem setup node 0 [mem > 0x-0x3fff] > [0.00] percpu: Embedded 19 pages/cpu s45132 r8192 d24500 u77824 > [0.00] Built 1 zonelists, mobility grouping on. Total pages: 260608 > [0.00] Kernel command line: earlycon console=ttyS0,115200 > root=/dev/nfs > rw nfsroot=10.122.105.146:/home/dnguyen/rootfs_yocto,tcp ip=dhcp > [0.00] Dentry cache hash table entries: 131072 (order: 7, 524288 > bytes, > linear) > [0.00] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, > li > near) > [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off > [0.00] Memory: 1027232K/1048576K available (8192K kernel code, 690K > rwda > ta, 1792K rodata, 1024K init, 159K bss, 21344K reserved, 0K cma-reserved, > 262144 > K highmem) > [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 > [0.00] ftrace: allocating 28185 entries in 56 pages > [0.00] ftrace: allocated 56 pages with 3 groups > [0.00] rcu: Hierarchical RCU implementation. > [0.00] rcu: RCU event tracing is enabled. > [0.00] Rude variant of Tasks RCU enabled. > [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 10 > jif > fies. > [0.00] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16 > [0.00] L2C-310 erratum 769419 enabled > [0.00] L2C-310 enabling early BRESP for Cortex-A9 > [0.00] L2C-310: enabling full line of zeros but not enabled in > Cortex-A9 > [0.00] L2C-310 ID prefetch enabled, offset 1 lines > [0.00] L2C-310 dynamic clock gating enabled, standby mode enabled > [0.00] L2C-310 cache controller enabled, 8 ways, 512 kB > [0.00] L2C-310: CACHE_ID 0x410030c9, AUX_CTRL 0x76560001 > [0.00] random: get_random_bytes called from start_kernel+0x388/0x528 > wit > h crng_init=0 > [0.00] clocksource: timer1: mask: 0x max_cycles: 0x, > max > _idle_ns: 38225208935 ns > [0.04] sched_clock: 32 bits at 50MHz, resolution 20ns, wraps every > 42949 > 672950ns > [0.007796] Switching to timer-based delay loop, resolution 20ns > [0.013861] 8<--- cut here --- > [0.016901] Unhandled fault: imprecise external abort (0x406) at 0x4b96623e > [0.023828] pgd = (ptrval) > [0.026520] [4b96623e] *pgd= > [0.030084] Internal error: : 406 [#1] SMP ARM > [0.034504] Modules linked in: > [0.037548] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 5.8.0-rc7-next-20200730 > #18 > [0.044992] Hardware name: Altera SOCFPGA Arria10 > [0.049677] PC is at apbt_enable_int+0x24/0x48 > [0.054098] LR is at dw_apb_clockevent_register+0x3c/0x40 > [0.059468] pc : []lr : []psr: 60d3 > [0.065702] sp : c0c01ef8 ip : c0c01f10 fp : c0c01f0c > [0.070908] r10: c0b44a3c r9 : r8 : c0a5ace0 > [0.076106] r7 : 0013 r6 : ef7f7640 r5 : ef06c0c0 r4 : a5f2106f > [0.082608] r3 : f081c000 r2 : 012c r1 : 20d3 r0 : ef06c0c0 > [0.089112] Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment > no > ne > [0.096384] Control: 10c5387d Table: 404a DAC: 0051 > [0.102110] Process swapper/0 (pid: 0, stack
Re: [PATCH] MIPS: Provide Kconfig option for default IEEE754 conformance mode
On Fri, Jul 31, 2020 at 12:10:16PM +0800, Jiaxun Yang wrote: > Requested by downstream distros, a Kconfig option for default > IEEE754 conformance mode allows them to set their mode to > relaxed by default. That's what should have been here in the first place. Thanks! Reviewed-by: Serge Semin > > Signed-off-by: Jiaxun Yang > --- > arch/mips/Kconfig| 21 + > arch/mips/kernel/cpu-probe.c | 12 +++- > 2 files changed, 32 insertions(+), 1 deletion(-) > > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index f6bb446d30f0..ef5b2a177b1b 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -2877,6 +2877,27 @@ config MIPS_NR_CPU_NR_MAP > default 1024 if MIPS_NR_CPU_NR_MAP_1024 > default NR_CPUS if !MIPS_NR_CPU_NR_MAP_1024 > > +choice > + prompt "Default IEEE Std 754 conformance mode" > + default IEEE754_DEFAULT_STRICT > + help > + Default IEEE Std 754 conformance mode, see ieee754= kernel parameter > + for detail. > + > + config IEEE754_DEFAULT_STRICT > + bool "Strict" > + > + config IEEE754_DEFAULT_LEGACY > + bool "Legacy" > + > + config IEEE754_DEFAULT_STD2008 > + bool "2008" > + > + config IEEE754_DEFAULT_RELAXED > + bool "Relaxed" > + > +endchoice > + > # > # Timer Interrupt Frequency Configuration > # > diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c > index d9e8d39a7289..03adeed58efb 100644 > --- a/arch/mips/kernel/cpu-probe.c > +++ b/arch/mips/kernel/cpu-probe.c > @@ -157,7 +157,17 @@ static void cpu_set_fpu_2008(struct cpuinfo_mips *c) > * IEEE 754 conformance mode to use. Affects the NaN encoding and the > * ABS.fmt/NEG.fmt execution mode. > */ > -static enum { STRICT, LEGACY, STD2008, RELAXED } ieee754 = STRICT; > +enum ieee754_mode { STRICT, LEGACY, STD2008, RELAXED }; > + > +#if defined(CONFIG_IEEE754_DEFAULT_STRICT) > +static enum ieee754_mode ieee754 = STRICT; > +#elif defined(CONFIG_IEEE754_DEFAULT_LEGACY) > +static enum ieee754_mode ieee754 = LEGACY; > +#elif defined(CONFIG_IEEE754_DEFAULT_STD2008) > +static enum ieee754_mode ieee754 = STD2008; > +#elif defined(CONFIG_IEEE754_DEFAULT_RELAXED) > +static enum ieee754_mode ieee754 = RELAXED; > +#endif > > /* > * Set the IEEE 754 NaN encodings and the ABS.fmt/NEG.fmt execution modes > -- > 2.28.0 >
Re: [PATCH 2/5] dmaengine: dw: Activate FIFO-mode for memory peripherals only
On Thu, Jul 30, 2020 at 07:47:03PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 07:31:54PM +0300, Serge Semin wrote: > > On Thu, Jul 30, 2020 at 07:24:28PM +0300, Andy Shevchenko wrote: > > > On Thu, Jul 30, 2020 at 06:45:42PM +0300, Serge Semin wrote: > > ... > > > > > Thanks to the commit ("dmaengine: dw: Initialize channel > > ... > > > > > Note the DMA-engine repository > > > > git.infradead.org/users/vkoul/slave-dma.git > > > > isn't accessible. So I couldn't find out the Andy' commit hash to use > > > > it in > > > > the log. > > > > > It's dmaengine.git on git.kernel.org. > > > > Ah, thanks! I've just found out that the repo address has been changed. But > > I've > > also scanned the "next" branch of the repo: > > https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git > > > > Your commit isn't there. Am I missing something? > > It's a fix. It went to upstream branch (don't remember its name by heart in > Vinod's repo). Right. Found it. Thanks. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH 5/5] dmaengine: dw: Add DMA-channels mask cell support
On Thu, Jul 30, 2020 at 07:41:46PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 06:45:45PM +0300, Serge Semin wrote: > > DW DMA IP-core provides a way to synthesize the DMA controller with > > channels having different parameters like maximum burst-length, > > multi-block support, maximum data width, etc. Those parameters both > > explicitly and implicitly affect the channels performance. Since DMA slave > > devices might be very demanding to the DMA performance, let's provide a > > functionality for the slaves to be assigned with DW DMA channels, which > > performance according to the platform engineer fulfill their requirements. > > After this patch is applied it can be done by passing the mask of suitable > > DMA-channels either directly in the dw_dma_slave structure instance or as > > a fifth cell of the DMA DT-property. If mask is zero or not provided, then > > there is no limitation on the channels allocation. > > > > For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first > > two channels are synthesized with max burst length of 16, while the rest > > of the channels have been created with max-burst-len=4. It would seem that > > the first two channels must be faster than the others and should be more > > preferable for the time-critical DMA slave devices. In practice it turned > > out that the situation is quite the opposite. The channels with > > max-burst-len=4 demonstrated a better performance than the channels with > > max-burst-len=16 even when they both had been initialized with the same > > settings. The performance drop of the first two DMA-channels made them > > unsuitable for the DW APB SSI slave device. No matter what settings they > > are configured with, full-duplex SPI transfers occasionally experience the > > Rx FIFO overflow. It means that the DMA-engine doesn't keep up with > > incoming data pace even though the SPI-bus is enabled with speed of 25MHz > > while the DW DMA controller is clocked with 50MHz signal. There is no such > > problem has been noticed for the channels synthesized with > > max-burst-len=4. > > ... > > > + if (dws->channels && !(dws->channels & dwc->mask)) > > You can drop the first check if... See below. > > > + return false; > > ... > > > + if (dma_spec->args_count >= 4) > > + slave.channels = dma_spec->args[3]; > > ...you apply sane default here or somewhere else. Alas I can't because dw_dma_slave structure is defined all over the kernel drivers/spi/spi-dw-dma.c drivers/spi/spi-pxa2xx-pci.c drivers/tty/serial/8250/8250_lpss.c These devices aren't always placed on the OF-based platforms. In that case the corresponding DMA-channels won't be requested by means of the dw_dma_of_xlate() method. So we have to preserve a default behavior if dws->channels is zero. > > ... > > > + fls(slave.channels) > dw->pdata->nr_channels)) > > Does it really make sense? It does to prevent the clients to specify an invalid channels mask, which can't have bits set higher than the number of channels the engine supports. > > I think it can also be simplified to faster op, i.e. > BIT(nr_channels) < slave.channels > (but check for off-by-one errors) Makes sense. Thanks. I'll replace it with the next statement: slave.channels >= BIT(dw->pdata->nr_channels) > > ... > > > + * @channels: mask of the channels permitted for allocation (zero > > + * value means any) > > Perhaps on one line? I don't really care. If you insist on that, I'll make it a single line, but it will be over 80 columns. 85 to be exact. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH 4/5] dmaengine: dw: Ignore burst setting for memory peripherals
On Thu, Jul 30, 2020 at 07:31:22PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 06:45:44PM +0300, Serge Semin wrote: > > According to the DW DMA controller Databook (page 40 "3.5 Memory > > Which version of it? 2.18b > > > Peripherals") memory peripherals don't have handshaking interface > > connected to the controller, therefore they can never be a flow > > controller. Since the CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are > > properties valid only for peripherals with a handshaking > > interface, we can freely zero these fields out if the memory peripheral > > is selected to be the source or the destination of the DMA transfers. > > > > Note according to the databook, length of burst transfers to memory is > > always equal to the number of data items available in a channel FIFO or > > data items required to complete the block transfer, whichever is smaller; > > length of burst transfers from memory is always equal to the space > > available in a channel FIFO or number of data items required to complete > > the block transfer, whichever is smaller. > > But does it really matter if you program there something or not? For memory peripherals it doesn't. But it's better to remove the redundant settings for consistency with the doc and to get rid of the unneeded stack-variable declaration. -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH 2/5] dmaengine: dw: Activate FIFO-mode for memory peripherals only
On Thu, Jul 30, 2020 at 07:24:28PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 06:45:42PM +0300, Serge Semin wrote: > > CFGx.FIFO_MODE field controls a DMA-controller "FIFO readiness" criterion. > > In other words it determines when to start pushing data out of a DW > > DMAC channel FIFO to a destination peripheral or from a source > > peripheral to the DW DMAC channel FIFO. Currently FIFO-mode is set to one > > for all DW DMAC channels. It means they are tuned to flush data out of > > FIFO (to a memory peripheral or by accepting the burst transaction > > requests) when FIFO is at least half-full (except at the end of the block > > transfer, when FIFO-flush mode is activated) and are configured to get > > data to the FIFO when it's at least half-empty. > > > > Such configuration is a good choice when there is no slave device involved > > in the DMA transfers. In that case the number of bursts per block is less > > than when CFGx.FIFO_MODE = 0 and, hence, the bus utilization will improve. > > But the latency of DMA transfers may increase when CFGx.FIFO_MODE = 1, > > since DW DMAC will wait for the channel FIFO contents to be either > > half-full or half-empty depending on having the destination or the source > > transfers. Such latencies might be dangerous in case if the DMA transfers > > are expected to be performed from/to a slave device. Since normally > > peripheral devices keep data in internal FIFOs, any latency at some > > critical moment may cause one being overflown and consequently losing > > data. This especially concerns a case when either a peripheral device is > > relatively fast or the DW DMAC engine is relatively slow with respect to > > the incoming data pace. > > > > In order to solve problems, which might be caused by the latencies > > described above, let's enable the FIFO half-full/half-empty "FIFO > > readiness" criterion only for DMA transfers with no slave device involved. > > > Thanks to the commit ("dmaengine: dw: Initialize channel > > See below. > > > before each transfer") we can freely do that in the generic > > dw_dma_initialize_chan() method. > > Reviewed-by: Andy Shevchenko > Thanks! > > > Signed-off-by: Serge Semin > > > > --- > > > > Note the DMA-engine repository git.infradead.org/users/vkoul/slave-dma.git > > isn't accessible. So I couldn't find out the Andy' commit hash to use it in > > the log. > > It's dmaengine.git on git.kernel.org. Ah, thanks! I've just found out that the repo address has been changed. But I've also scanned the "next" branch of the repo: https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git Your commit isn't there. Am I missing something? -Sergey > > > --- > > drivers/dma/dw/dw.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/dma/dw/dw.c b/drivers/dma/dw/dw.c > > index 7a085b3c1854..d9810980920a 100644 > > --- a/drivers/dma/dw/dw.c > > +++ b/drivers/dma/dw/dw.c > > @@ -14,7 +14,7 @@ > > static void dw_dma_initialize_chan(struct dw_dma_chan *dwc) > > { > > struct dw_dma *dw = to_dw_dma(dwc->chan.device); > > - u32 cfghi = DWC_CFGH_FIFO_MODE; > > + u32 cfghi = is_slave_direction(dwc->direction) ? 0 : DWC_CFGH_FIFO_MODE; > > u32 cfglo = DWC_CFGL_CH_PRIOR(dwc->priority); > > bool hs_polarity = dwc->dws.hs_polarity; > > > > -- > > 2.27.0 > > > > -- > With Best Regards, > Andy Shevchenko > >
[PATCH 3/5] dmaengine: dw: Discard dlen from the dev-to-mem xfer width calculation
Indeed in case of the DMA_DEV_TO_MEM DMA transfers it's enough to take the destination memory address and the destination master data width into account to calculate the CTLx.DST_TR_WIDTH setting of the memory peripheral. According to the DW DMAC IP-core Databook (page 66, Example 5) at the and of a DMA transfer when the DMA-channel internal FIFO is left with data less than for a single destination burst transaction, the destination peripheral will enter the Single Transaction Region where the DW DMA controller can complete a block transfer to the destination using single transactions (non-burst transaction of CTLx.DST_TR_WIDTH bytes). If there is no enough data in the DMA-channel internal FIFO for even a single non-burst transaction of CTLx.DST_TR_WIDTH bytes, then the channel enters "FIFO flush mode". That mode is activated to empty the FIFO and flush the leftovers out to the memory peripheral. The flushing procedure is simple. The data is sent to the memory by means of a set of single transaction of CTLx.SRC_TR_WIDTH bytes. To sum up it's redundant to use the LLPs length to find out the CTLx.DST_TR_WIDTH parameter value, since each DMA transfer will be completed with the CTLx.SRC_TR_WIDTH bytes transaction if it is required. In this commit we remove the LLP entry length from the statement which calculates the memory peripheral DMA transaction width since it's redundant due to the feature described above. By doing so we'll improve the memory bus utilization and speed up the DMA-channel performance for DMA_DEV_TO_MEM DMA-transfers. Signed-off-by: Serge Semin --- drivers/dma/dw/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 4700f2e87a62..3da0aea9fe25 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -723,7 +723,7 @@ dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl, lli_write(desc, sar, reg); lli_write(desc, dar, mem); lli_write(desc, ctlhi, ctlhi); - mem_width = __ffs(data_width | mem | dlen); + mem_width = __ffs(data_width | mem); lli_write(desc, ctllo, ctllo | DWC_CTLL_DST_WIDTH(mem_width)); desc->len = dlen; -- 2.27.0
[PATCH 2/5] dmaengine: dw: Activate FIFO-mode for memory peripherals only
CFGx.FIFO_MODE field controls a DMA-controller "FIFO readiness" criterion. In other words it determines when to start pushing data out of a DW DMAC channel FIFO to a destination peripheral or from a source peripheral to the DW DMAC channel FIFO. Currently FIFO-mode is set to one for all DW DMAC channels. It means they are tuned to flush data out of FIFO (to a memory peripheral or by accepting the burst transaction requests) when FIFO is at least half-full (except at the end of the block transfer, when FIFO-flush mode is activated) and are configured to get data to the FIFO when it's at least half-empty. Such configuration is a good choice when there is no slave device involved in the DMA transfers. In that case the number of bursts per block is less than when CFGx.FIFO_MODE = 0 and, hence, the bus utilization will improve. But the latency of DMA transfers may increase when CFGx.FIFO_MODE = 1, since DW DMAC will wait for the channel FIFO contents to be either half-full or half-empty depending on having the destination or the source transfers. Such latencies might be dangerous in case if the DMA transfers are expected to be performed from/to a slave device. Since normally peripheral devices keep data in internal FIFOs, any latency at some critical moment may cause one being overflown and consequently losing data. This especially concerns a case when either a peripheral device is relatively fast or the DW DMAC engine is relatively slow with respect to the incoming data pace. In order to solve problems, which might be caused by the latencies described above, let's enable the FIFO half-full/half-empty "FIFO readiness" criterion only for DMA transfers with no slave device involved. Thanks to the commit ("dmaengine: dw: Initialize channel before each transfer") we can freely do that in the generic dw_dma_initialize_chan() method. Signed-off-by: Serge Semin --- Note the DMA-engine repository git.infradead.org/users/vkoul/slave-dma.git isn't accessible. So I couldn't find out the Andy' commit hash to use it in the log. --- drivers/dma/dw/dw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/dw/dw.c b/drivers/dma/dw/dw.c index 7a085b3c1854..d9810980920a 100644 --- a/drivers/dma/dw/dw.c +++ b/drivers/dma/dw/dw.c @@ -14,7 +14,7 @@ static void dw_dma_initialize_chan(struct dw_dma_chan *dwc) { struct dw_dma *dw = to_dw_dma(dwc->chan.device); - u32 cfghi = DWC_CFGH_FIFO_MODE; + u32 cfghi = is_slave_direction(dwc->direction) ? 0 : DWC_CFGH_FIFO_MODE; u32 cfglo = DWC_CFGL_CH_PRIOR(dwc->priority); bool hs_polarity = dwc->dws.hs_polarity; -- 2.27.0
[PATCH 4/5] dmaengine: dw: Ignore burst setting for memory peripherals
According to the DW DMA controller Databook (page 40 "3.5 Memory Peripherals") memory peripherals don't have handshaking interface connected to the controller, therefore they can never be a flow controller. Since the CTLx.SRC_MSIZE and CTLx.DEST_MSIZE are properties valid only for peripherals with a handshaking interface, we can freely zero these fields out if the memory peripheral is selected to be the source or the destination of the DMA transfers. Note according to the databook, length of burst transfers to memory is always equal to the number of data items available in a channel FIFO or data items required to complete the block transfer, whichever is smaller; length of burst transfers from memory is always equal to the space available in a channel FIFO or number of data items required to complete the block transfer, whichever is smaller. Signed-off-by: Serge Semin --- drivers/dma/dw/dw.c | 5 ++--- drivers/dma/dw/idma32.c | 5 ++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/dma/dw/dw.c b/drivers/dma/dw/dw.c index d9810980920a..a4862263ff14 100644 --- a/drivers/dma/dw/dw.c +++ b/drivers/dma/dw/dw.c @@ -67,9 +67,8 @@ static size_t dw_dma_block2bytes(struct dw_dma_chan *dwc, u32 block, u32 width) static u32 dw_dma_prepare_ctllo(struct dw_dma_chan *dwc) { struct dma_slave_config *sconfig = >dma_sconfig; - bool is_slave = is_slave_direction(dwc->direction); - u8 smsize = is_slave ? sconfig->src_maxburst : DW_DMA_MSIZE_16; - u8 dmsize = is_slave ? sconfig->dst_maxburst : DW_DMA_MSIZE_16; + u8 smsize = (dwc->direction == DMA_DEV_TO_MEM) ? sconfig->src_maxburst : 0; + u8 dmsize = (dwc->direction == DMA_MEM_TO_DEV) ? sconfig->dst_maxburst : 0; u8 p_master = dwc->dws.p_master; u8 m_master = dwc->dws.m_master; u8 dms = (dwc->direction == DMA_MEM_TO_DEV) ? p_master : m_master; diff --git a/drivers/dma/dw/idma32.c b/drivers/dma/dw/idma32.c index f00657308811..3ce44de25d33 100644 --- a/drivers/dma/dw/idma32.c +++ b/drivers/dma/dw/idma32.c @@ -73,9 +73,8 @@ static size_t idma32_block2bytes(struct dw_dma_chan *dwc, u32 block, u32 width) static u32 idma32_prepare_ctllo(struct dw_dma_chan *dwc) { struct dma_slave_config *sconfig = >dma_sconfig; - bool is_slave = is_slave_direction(dwc->direction); - u8 smsize = is_slave ? sconfig->src_maxburst : IDMA32_MSIZE_8; - u8 dmsize = is_slave ? sconfig->dst_maxburst : IDMA32_MSIZE_8; + u8 smsize = (dwc->direction == DMA_DEV_TO_MEM) ? sconfig->src_maxburst : 0; + u8 dmsize = (dwc->direction == DMA_MEM_TO_DEV) ? sconfig->dst_maxburst : 0; return DWC_CTLL_LLP_D_EN | DWC_CTLL_LLP_S_EN | DWC_CTLL_DST_MSIZE(dmsize) | DWC_CTLL_SRC_MSIZE(smsize); -- 2.27.0
[PATCH 5/5] dmaengine: dw: Add DMA-channels mask cell support
DW DMA IP-core provides a way to synthesize the DMA controller with channels having different parameters like maximum burst-length, multi-block support, maximum data width, etc. Those parameters both explicitly and implicitly affect the channels performance. Since DMA slave devices might be very demanding to the DMA performance, let's provide a functionality for the slaves to be assigned with DW DMA channels, which performance according to the platform engineer fulfill their requirements. After this patch is applied it can be done by passing the mask of suitable DMA-channels either directly in the dw_dma_slave structure instance or as a fifth cell of the DMA DT-property. If mask is zero or not provided, then there is no limitation on the channels allocation. For instance Baikal-T1 SoC is equipped with a DW DMAC engine, which first two channels are synthesized with max burst length of 16, while the rest of the channels have been created with max-burst-len=4. It would seem that the first two channels must be faster than the others and should be more preferable for the time-critical DMA slave devices. In practice it turned out that the situation is quite the opposite. The channels with max-burst-len=4 demonstrated a better performance than the channels with max-burst-len=16 even when they both had been initialized with the same settings. The performance drop of the first two DMA-channels made them unsuitable for the DW APB SSI slave device. No matter what settings they are configured with, full-duplex SPI transfers occasionally experience the Rx FIFO overflow. It means that the DMA-engine doesn't keep up with incoming data pace even though the SPI-bus is enabled with speed of 25MHz while the DW DMA controller is clocked with 50MHz signal. There is no such problem has been noticed for the channels synthesized with max-burst-len=4. Signed-off-by: Serge Semin --- drivers/dma/dw/core.c| 4 drivers/dma/dw/of.c | 7 +-- include/linux/platform_data/dma-dw.h | 3 +++ 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c index 3da0aea9fe25..5f7b9badb965 100644 --- a/drivers/dma/dw/core.c +++ b/drivers/dma/dw/core.c @@ -772,6 +772,10 @@ bool dw_dma_filter(struct dma_chan *chan, void *param) if (dws->dma_dev != chan->device->dev) return false; + /* permit channels in accordance with the channels mask */ + if (dws->channels && !(dws->channels & dwc->mask)) + return false; + /* We have to copy data since dws can be temporary storage */ memcpy(>dws, dws, sizeof(struct dw_dma_slave)); diff --git a/drivers/dma/dw/of.c b/drivers/dma/dw/of.c index 1474b3817ef4..abdf22b269b5 100644 --- a/drivers/dma/dw/of.c +++ b/drivers/dma/dw/of.c @@ -22,18 +22,21 @@ static struct dma_chan *dw_dma_of_xlate(struct of_phandle_args *dma_spec, }; dma_cap_mask_t cap; - if (dma_spec->args_count != 3) + if (dma_spec->args_count < 3 || dma_spec->args_count > 4) return NULL; slave.src_id = dma_spec->args[0]; slave.dst_id = dma_spec->args[0]; slave.m_master = dma_spec->args[1]; slave.p_master = dma_spec->args[2]; + if (dma_spec->args_count >= 4) + slave.channels = dma_spec->args[3]; if (WARN_ON(slave.src_id >= DW_DMA_MAX_NR_REQUESTS || slave.dst_id >= DW_DMA_MAX_NR_REQUESTS || slave.m_master >= dw->pdata->nr_masters || - slave.p_master >= dw->pdata->nr_masters)) + slave.p_master >= dw->pdata->nr_masters || + fls(slave.channels) > dw->pdata->nr_channels)) return NULL; dma_cap_zero(cap); diff --git a/include/linux/platform_data/dma-dw.h b/include/linux/platform_data/dma-dw.h index 4f681df85c27..3bc48451a70c 100644 --- a/include/linux/platform_data/dma-dw.h +++ b/include/linux/platform_data/dma-dw.h @@ -23,6 +23,8 @@ * @dst_id:dst request line * @m_master: memory master for transfers on allocated channel * @p_master: peripheral master for transfers on allocated channel + * @channels: mask of the channels permitted for allocation (zero + * value means any) * @hs_polarity:set active low polarity of handshake interface */ struct dw_dma_slave { @@ -31,6 +33,7 @@ struct dw_dma_slave { u8 dst_id; u8 m_master; u8 p_master; + u8 channels; boolhs_polarity; }; -- 2.27.0
[PATCH 0/5] dmaengine: dw: Introduce non-mem peripherals optimizations
After a lot of tests and thorough DW DMAC databook studying we've discovered that the driver can be optimized especially when it comes to working with non-memory peripherals. First of all we've found out that since each DW DMAC channel can be synthesized with different parameters, then even when two of them are configured to perform the same DMA transactions they may execute them with different performance. Since some DMA client devices might be sensitive to such important parameter as performance, then it is a good idea to let them request only suitable DMA channels. In this patchset we introduce a functionality, which makes it possible by passing the DMA channels mask either over the "dmas" DT property or in the dw_dma_slave platform data descriptor. Secondly FIFO-mode of the "FIFO readiness" criterion is more suitable for the pure memory DMA transfers, since it minimizes the system bus utilization, but causes some performance drop. When it comes to working with non-memory peripherals the DMA engine performance comes to the first place. Since normally DMA client devices keep data in internal FIFOs, any latency at some critical moment may cause a FIFO being overflown and consequently losing data. So in order to minimize a chance of the DW DMAC internal FIFO being a bottle neck during the DMA transfers to and from non-memory peripherals we propose not to use FIFO-mode for them. Thirdly it has been discovered that using a DMA transaction length is redundant when calculating the destination transfer width for the dev-to-mem DMA communications. That shall increase performance of the DMA transfers with unaligned data length. Finally there is a small optimization in the burst length setting. In particular we've found out, that according to the DW DMAC databoot it's pointless to set one for the memory peripherals since they don't have handshaking interface connected to the DMA controller. So we suggest to just ignore the burst length config when it comes to setting the memory peripherals up. Note this patchset is supposed to be applied on top of the series: Link: https://lore.kernel.org/dmaengine/20200723005848.31907-1-sergey.se...@baikalelectronics.ru Signed-off-by: Serge Semin Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: Peter Ujfalusi Cc: Andy Shevchenko Cc: Rob Herring Cc: dmaeng...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (5): dt-bindings: dma: dw: Add optional DMA-channels mask cell support dmaengine: dw: Activate FIFO-mode for memory peripherals only dmaengine: dw: Discard dlen from the dev-to-mem xfer width calculation dmaengine: dw: Ignore burst setting for memory peripherals dmaengine: dw: Add DMA-channels mask cell support .../devicetree/bindings/dma/snps,dma-spear1340.yaml| 7 +-- drivers/dma/dw/core.c | 6 +- drivers/dma/dw/dw.c| 7 +++ drivers/dma/dw/idma32.c| 5 ++--- drivers/dma/dw/of.c| 7 +-- include/linux/platform_data/dma-dw.h | 3 +++ 6 files changed, 23 insertions(+), 12 deletions(-) -- 2.27.0
[PATCH 1/5] dt-bindings: dma: dw: Add optional DMA-channels mask cell support
Each DW DMA controller channel can be synthesized with different parameters like maximum burst-length, multi-block support, maximum data width, etc. Most of these parameters determine the DW DMAC channels performance in its own aspect. On the other hand these parameters can be implicitly responsible for the channels performance degradation (for instance multi-block support is a very useful feature, but having it disabled during the DW DMAC synthesize will provide a more optimized core). Since DMA slave devices may have critical dependency on the DMA engine performance, let's provide a way for the slave devices to have the DMA-channels allocated from a pool of the channels, which according to the system engineer fulfill their performance requirements. The pool is determined by a mask optionally specified in the fifth DMA-cell of the DMA DT-property. If the fifth cell is omitted from the phandle arguments or the mask is zero, then the allocation will be performed from a set of all channels provided by the DMA controller. Signed-off-by: Serge Semin --- .../devicetree/bindings/dma/snps,dma-spear1340.yaml| 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml b/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml index 20870f5c14dd..ef1d6879c158 100644 --- a/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml +++ b/Documentation/devicetree/bindings/dma/snps,dma-spear1340.yaml @@ -18,12 +18,15 @@ properties: const: snps,dma-spear1340 "#dma-cells": -const: 3 +minimum: 3 +maximum: 4 description: | First cell is a phandle pointing to the DMA controller. Second one is the DMA request line number. Third cell is the memory master identifier for transfers on dynamically allocated channel. Fourth cell is the - peripheral master identifier for transfers on an allocated channel. + peripheral master identifier for transfers on an allocated channel. Fifth + cell is an optional mask of the DMA channels permitted to be allocated + for the corresponding client device. reg: maxItems: 1 -- 2.27.0
[PATCH v3 00/10] gpio: dwapb: Refactor GPIO resources initialization
This series is about the DW APB GPIO device initialization procedure cleaning up. First of all it has been discovered that having a vendor-specific "snps,nr-gpios" property isn't only redundant but also might be dangerous (see the commit log for details). Instead we suggest to use the generic "ngpios" property to define a number of GPIOs each DW APB GPIO controller port supports. Secondly seeing a tendency of the other GPIO drivers getting converted to using the GPIO-lib-based IRQ-chip interface this series provides a patch, which replaces the DW APB GPIO driver Generic IRQ-chip implementation with the GPIO-lib IRQ-chip one. Finally the DW APB GPIO device probe procedure is simplified by converting the code to be using the device managed resources for the reference clocks initialization, reset control assertion/de-assertion and GPIO-chip registration. Some additional cleanups like replacing a number of GPIOs literal with a corresponding macro and grouping the IRQ handlers up in a single place of the driver are also introduced in this patchset. Link: https://lore.kernel.org/linux-gpio/20200723013858.10766-1-sergey.se...@baikalelectronics.ru/ Changelog v2: - Replace gc->to_irq() with irq_find_mapping() method. - Refactor dwapb_irq_set_type() method to directly set IRQ flow handler instead of using a temporary variable. - Initialize GPIOlib IRQ-chip settings before calling request_irq() method. - Add a notice regarding regression of commit 6a2f4b7dadd5 ("gpio: dwapb: use a second irq chip"). - Move the acpi_gpiochip_{request,free}_interrupts() methods invocation removal to a dedicated patch. - Move GPIO-chip to_irq callback removal to a dedicated patch. - Add a patch which replaces a max number of GPIO literals with a macro. - Introduce dwapb_convert_irqs() method to convert the sparse parental IRQs array into an array of linearly distributed IRQs correctly perceived by GPIO-lib. Link: https://lore.kernel.org/linux-gpio/20200730135536.19747-1-sergey.se...@baikalelectronics.ru Changelog v3: - Fix patches SoB's. - Add Andy' Suggested-by to the patch: "gpio: dwapb: Add max GPIOs macro" - Add blank lines to the IRQ-chip conversion commit message to make it more readable. - Dynamically allocate memory for the IRQ-chip-related data. Signed-off-by: Serge Semin Cc: Andy Shevchenko Cc: Andy Shevchenko Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: Rob Herring Cc: linux-g...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (10): dt-bindings: gpio: dwapb: Add ngpios property support gpio: dwapb: Add ngpios DT-property support gpio: dwapb: Move MFD-specific IRQ handler gpio: dwapb: Add max GPIOs macro gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip gpio: dwapb: Discard GPIO-to-IRQ mapping function gpio: dwapb: Discard ACPI GPIO-chip IRQs request gpio: dwapb: Get reset control by means of resource managed interface gpio: dwapb: Get clocks by means of resource managed interface gpio: dwapb: Use resource managed GPIO-chip add data method .../bindings/gpio/snps,dw-apb-gpio.yaml | 6 + drivers/gpio/Kconfig | 2 +- drivers/gpio/gpio-dwapb.c | 352 +- include/linux/platform_data/gpio-dwapb.h | 4 +- 4 files changed, 191 insertions(+), 173 deletions(-) -- 2.27.0
[PATCH v3 08/10] gpio: dwapb: Get reset control by means of resource managed interface
The reset control interface provides the resource managed version of the reset_control_get() method. The only thing which needs to be also automated is the reset lane assertion on the device removal. It can be implemented by means of the custom action definition. After that the reset control will be purely managed by the device resources interface. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 35 +-- 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 707e9087ca59..faac594287b3 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -621,6 +621,32 @@ static struct dwapb_platform_data *dwapb_gpio_get_pdata(struct device *dev) return pdata; } +static void dwapb_assert_reset(void *data) +{ + struct dwapb_gpio *gpio = data; + + reset_control_assert(gpio->rst); +} + +static int dwapb_get_reset(struct dwapb_gpio *gpio) +{ + int err; + + gpio->rst = devm_reset_control_get_optional_shared(gpio->dev, NULL); + if (IS_ERR(gpio->rst)) { + dev_err(gpio->dev, "Cannot get reset descriptor\n"); + return PTR_ERR(gpio->rst); + } + + err = reset_control_deassert(gpio->rst); + if (err) { + dev_err(gpio->dev, "Cannot deassert reset lane\n"); + return err; + } + + return devm_add_action_or_reset(gpio->dev, dwapb_assert_reset, gpio); +} + static const struct of_device_id dwapb_of_match[] = { { .compatible = "snps,dw-apb-gpio", .data = (void *)0}, { .compatible = "apm,xgene-gpio-v2", .data = (void *)GPIO_REG_OFFSET_V2}, @@ -660,11 +686,9 @@ static int dwapb_gpio_probe(struct platform_device *pdev) gpio->dev = >dev; gpio->nr_ports = pdata->nports; - gpio->rst = devm_reset_control_get_optional_shared(dev, NULL); - if (IS_ERR(gpio->rst)) - return PTR_ERR(gpio->rst); - - reset_control_deassert(gpio->rst); + err = dwapb_get_reset(gpio); + if (err) + return err; gpio->ports = devm_kcalloc(>dev, gpio->nr_ports, sizeof(*gpio->ports), GFP_KERNEL); @@ -714,7 +738,6 @@ static int dwapb_gpio_remove(struct platform_device *pdev) struct dwapb_gpio *gpio = platform_get_drvdata(pdev); dwapb_gpio_unregister(gpio); - reset_control_assert(gpio->rst); clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); return 0; -- 2.27.0
[PATCH v3 04/10] gpio: dwapb: Add max GPIOs macro
Add a new macro DWAPB_MAX_GPIOS which defines the maximum possible number of GPIO lines corresponding to the maximum DW APB GPIO controller port width. Use the new macro instead of number literal 32 where it's applicable. Suggested-by: Andy Shevchenko Signed-off-by: Serge Semin --- drivers/gpio/gpio-dwapb.c| 8 include/linux/platform_data/gpio-dwapb.h | 4 +++- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 3081213247d8..f34001152850 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -162,7 +162,7 @@ static struct dwapb_gpio_port *dwapb_offs_to_port(struct dwapb_gpio *gpio, unsig for (i = 0; i < gpio->nr_ports; i++) { port = >ports[i]; - if (port->idx == offs / 32) + if (port->idx == offs / DWAPB_MAX_GPIOS) return port; } @@ -182,7 +182,7 @@ static void dwapb_toggle_trigger(struct dwapb_gpio *gpio, unsigned int offs) pol = dwapb_read(gpio, GPIO_INT_POLARITY); /* Just read the current value right out of the data register */ - val = gc->get(gc, offs % 32); + val = gc->get(gc, offs % DWAPB_MAX_GPIOS); if (val) pol &= ~BIT(offs); else @@ -197,7 +197,7 @@ static u32 dwapb_do_irq(struct dwapb_gpio *gpio) irq_hw_number_t hwirq; irq_status = dwapb_read(gpio, GPIO_INTSTATUS); - for_each_set_bit(hwirq, _status, 32) { + for_each_set_bit(hwirq, _status, DWAPB_MAX_GPIOS) { int gpio_irq = irq_find_mapping(gpio->domain, hwirq); u32 irq_type = irq_get_trigger_type(gpio_irq); @@ -599,7 +599,7 @@ static struct dwapb_platform_data *dwapb_gpio_get_pdata(struct device *dev) dev_info(dev, "failed to get number of gpios for port%d\n", i); - pp->ngpio = 32; + pp->ngpio = DWAPB_MAX_GPIOS; } pp->irq_shared = false; diff --git a/include/linux/platform_data/gpio-dwapb.h b/include/linux/platform_data/gpio-dwapb.h index ff1be737bad6..0aa5c6720259 100644 --- a/include/linux/platform_data/gpio-dwapb.h +++ b/include/linux/platform_data/gpio-dwapb.h @@ -6,12 +6,14 @@ #ifndef GPIO_DW_APB_H #define GPIO_DW_APB_H +#define DWAPB_MAX_GPIOS32 + struct dwapb_port_property { struct fwnode_handle *fwnode; unsigned intidx; unsigned intngpio; unsigned intgpio_base; - int irq[32]; + int irq[DWAPB_MAX_GPIOS]; boolirq_shared; }; -- 2.27.0
[PATCH v3 06/10] gpio: dwapb: Discard GPIO-to-IRQ mapping function
Since GPIOlib-based IRQ-chip interface is now utilized there is no need in setting up a custom GPIO-to-IRQ mapping method. GPIO-lib defines the standard mapping method - gpiochip_to_irq(), which will be used anyway no matter whether the custom to_irq callback is specified or not. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- Changelog v2: - This is a new patch detached from commit "gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip". --- drivers/gpio/gpio-dwapb.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index eb3a6da453ff..b6affe7161b0 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -154,14 +154,6 @@ static inline void dwapb_write(struct dwapb_gpio *gpio, unsigned int offset, gc->write_reg(reg_base + gpio_reg_convert(gpio, offset), val); } -static int dwapb_gpio_to_irq(struct gpio_chip *gc, unsigned offset) -{ - struct dwapb_gpio_port *port = gpiochip_get_data(gc); - struct dwapb_gpio *gpio = port->gpio; - - return irq_find_mapping(gpio->domain, offset); -} - static struct dwapb_gpio_port *dwapb_offs_to_port(struct dwapb_gpio *gpio, unsigned int offs) { struct dwapb_gpio_port *port; @@ -476,8 +468,6 @@ static void dwapb_configure_irqs(struct dwapb_gpio *gpio, girq->chip = >irqchip; - port->gc.to_irq = dwapb_gpio_to_irq; - return; err_kfree_pirq: -- 2.27.0
[PATCH v3 01/10] dt-bindings: gpio: dwapb: Add ngpios property support
It's redundant to have a vendor-specific property describing a number of GPIOS while there is a generic one. Let's mark the former one as deprecated and define the "ngpios" property supported with constraints of being within [1; 32] range. Signed-off-by: Serge Semin Reviewed-by: Rob Herring --- .../devicetree/bindings/gpio/snps,dw-apb-gpio.yaml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml b/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml index 1240f6289249..b391cc1b4590 100644 --- a/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml @@ -61,8 +61,14 @@ patternProperties: '#gpio-cells': const: 2 + ngpios: +default: 32 +minimum: 1 +maximum: 32 + snps,nr-gpios: description: The number of GPIO pins exported by the port. +deprecated: true $ref: /schemas/types.yaml#/definitions/uint32 default: 32 minimum: 1 -- 2.27.0
[PATCH v3 09/10] gpio: dwapb: Get clocks by means of resource managed interface
The kernel clock framework provides the resource managed version of the clk_bulk_get() method. The only thing which needs to be also automated is the clocks disable/unprepare procedure executed on the device removal. It can be implemented by means of the custom action definition. After that the clocks acquisition and release will be purely managed by the device resources interface. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 48 ++- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index faac594287b3..803e1d1f4b5a 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -647,6 +647,36 @@ static int dwapb_get_reset(struct dwapb_gpio *gpio) return devm_add_action_or_reset(gpio->dev, dwapb_assert_reset, gpio); } +static void dwapb_disable_clks(void *data) +{ + struct dwapb_gpio *gpio = data; + + clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); +} + +static int dwapb_get_clks(struct dwapb_gpio *gpio) +{ + int err; + + /* Optional bus and debounce clocks */ + gpio->clks[0].id = "bus"; + gpio->clks[1].id = "db"; + err = devm_clk_bulk_get_optional(gpio->dev, DWAPB_NR_CLOCKS, +gpio->clks); + if (err) { + dev_err(gpio->dev, "Cannot get APB/Debounce clocks\n"); + return err; + } + + err = clk_bulk_prepare_enable(DWAPB_NR_CLOCKS, gpio->clks); + if (err) { + dev_err(gpio->dev, "Cannot enable APB/Debounce clocks\n"); + return err; + } + + return devm_add_action_or_reset(gpio->dev, dwapb_disable_clks, gpio); +} + static const struct of_device_id dwapb_of_match[] = { { .compatible = "snps,dw-apb-gpio", .data = (void *)0}, { .compatible = "apm,xgene-gpio-v2", .data = (void *)GPIO_REG_OFFSET_V2}, @@ -699,21 +729,9 @@ static int dwapb_gpio_probe(struct platform_device *pdev) if (IS_ERR(gpio->regs)) return PTR_ERR(gpio->regs); - /* Optional bus and debounce clocks */ - gpio->clks[0].id = "bus"; - gpio->clks[1].id = "db"; - err = devm_clk_bulk_get_optional(>dev, DWAPB_NR_CLOCKS, -gpio->clks); - if (err) { - dev_err(>dev, "Cannot get APB/Debounce clocks\n"); - return err; - } - - err = clk_bulk_prepare_enable(DWAPB_NR_CLOCKS, gpio->clks); - if (err) { - dev_err(>dev, "Cannot enable APB/Debounce clocks\n"); + err = dwapb_get_clks(gpio); + if (err) return err; - } gpio->flags = (uintptr_t)device_get_match_data(dev); @@ -728,7 +746,6 @@ static int dwapb_gpio_probe(struct platform_device *pdev) out_unregister: dwapb_gpio_unregister(gpio); - clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); return err; } @@ -738,7 +755,6 @@ static int dwapb_gpio_remove(struct platform_device *pdev) struct dwapb_gpio *gpio = platform_get_drvdata(pdev); dwapb_gpio_unregister(gpio); - clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); return 0; } -- 2.27.0
[PATCH v3 07/10] gpio: dwapb: Discard ACPI GPIO-chip IRQs request
Since GPIOlib-based IRQ-chip interface is now utilized there is no need in calling the methods acpi_gpiochip_{request,free}_interrupts() here. They will be called from gpiochip_add_irqchip()/gpiochip_irqchip_remove() anyway. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- Changelog v2: - This is a new patch detached from commit "gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip". --- drivers/gpio/gpio-dwapb.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index b6affe7161b0..707e9087ca59 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -526,9 +526,6 @@ static int dwapb_gpio_add_port(struct dwapb_gpio *gpio, return err; } - /* Add GPIO-signaled ACPI event support */ - acpi_gpiochip_request_interrupts(>gc); - port->is_registered = true; return 0; @@ -544,7 +541,6 @@ static void dwapb_gpio_unregister(struct dwapb_gpio *gpio) if (!port->is_registered) continue; - acpi_gpiochip_free_interrupts(>gc); gpiochip_remove(>gc); } } -- 2.27.0
[PATCH v3 02/10] gpio: dwapb: Add ngpios DT-property support
Indeed generic GPIO DT-schema implies that number of GPIOs should be described by the "ngpios" property located under a GPIO-provider DT node. In that case it's redundant to have a vendor-specific "snps,nr-gpios" property describing the same setting. Moreover it might be errors prone. Since commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when adding all top level devices") the fwnode parsing is resumed after the vast majority of the platform devices are added. Implicitly that commit activates re-parsing of the whole device tree GPIOs-phandle properties detected having "-gpio/-gpios" suffixes. Since the vendor-specific number of GPIOs property is defined with "-gpios" suffix, then of_link_property() will consider it as a suffix-property with "#gpio-cells" structure, which obviously it doesn't have. As a result for two DW APB GPIO controllers we'll have the next errors printed. OF: /bus@1f059000/gpio@1f044000/gpio-port@0: could not find phandle OF: /bus@1f059000/gpio@1f045000/gpio-port@0: could not get #gpio-cells for /opp-table OF: /bus@1f059000/gpio@1f044000/gpio-port@0: could not find phandle OF: /bus@1f059000/gpio@1f045000/gpio-port@0: could not get #gpio-cells for /opp-table See, the kernel fwnode parsing procedure even tried to resolve the phandle ID, which it thought was the opp-table DT-node, while in fact it was just a number "32". What would happen if that magic number actually referred to a GPIO DT-node with "#gpio-cells" property defined?.. In order to fix the problem let's mark the "snps,nr-gpios" property as deprecated and add the generic "ngpios" property support with the same purpose as the deprecated one. That and the errors log above shall motivate the platform developer to convert the DW APB GPIO DT-nodes to using the standard number of GPIOs property. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 1d8d55bd63aa..ccd80df16062 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -594,7 +594,8 @@ static struct dwapb_platform_data *dwapb_gpio_get_pdata(struct device *dev) return ERR_PTR(-EINVAL); } - if (fwnode_property_read_u32(fwnode, "snps,nr-gpios", >ngpio)) { + if (fwnode_property_read_u32(fwnode, "ngpios", >ngpio) && + fwnode_property_read_u32(fwnode, "snps,nr-gpios", >ngpio)) { dev_info(dev, "failed to get number of gpios for port%d\n", i); -- 2.27.0
[PATCH v3 10/10] gpio: dwapb: Use resource managed GPIO-chip add data method
Since the resource managed version of gpiochip_add_data() will handle the GPIO-chip data automated cleanup we can freely remove the DW APB GPIO driver code responsible for that. After doing so the DW APB GPIO driver removal callback can be also fully discarded since there is nothing left to be done for it. All the cleanups are now performed by means of the device managed framework. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 37 ++--- 1 file changed, 2 insertions(+), 35 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 803e1d1f4b5a..a5b326754124 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -91,7 +91,6 @@ struct dwapb_gpio_port_irqchip { struct dwapb_gpio_port { struct gpio_chipgc; struct dwapb_gpio_port_irqchip *pirq; - boolis_registered; struct dwapb_gpio *gpio; #ifdef CONFIG_PM_SLEEP struct dwapb_context*ctx; @@ -519,32 +518,16 @@ static int dwapb_gpio_add_port(struct dwapb_gpio *gpio, if (pp->idx == 0) dwapb_configure_irqs(gpio, port, pp); - err = gpiochip_add_data(>gc, port); + err = devm_gpiochip_add_data(gpio->dev, >gc, port); if (err) { dev_err(gpio->dev, "failed to register gpiochip for port%d\n", port->idx); return err; } - port->is_registered = true; - return 0; } -static void dwapb_gpio_unregister(struct dwapb_gpio *gpio) -{ - unsigned int m; - - for (m = 0; m < gpio->nr_ports; ++m) { - struct dwapb_gpio_port *port = >ports[m]; - - if (!port->is_registered) - continue; - - gpiochip_remove(>gc); - } -} - static void dwapb_get_irq(struct device *dev, struct fwnode_handle *fwnode, struct dwapb_port_property *pp) { @@ -738,23 +721,8 @@ static int dwapb_gpio_probe(struct platform_device *pdev) for (i = 0; i < gpio->nr_ports; i++) { err = dwapb_gpio_add_port(gpio, >properties[i], i); if (err) - goto out_unregister; + return err; } - platform_set_drvdata(pdev, gpio); - - return 0; - -out_unregister: - dwapb_gpio_unregister(gpio); - - return err; -} - -static int dwapb_gpio_remove(struct platform_device *pdev) -{ - struct dwapb_gpio *gpio = platform_get_drvdata(pdev); - - dwapb_gpio_unregister(gpio); return 0; } @@ -858,7 +826,6 @@ static struct platform_driver dwapb_gpio_driver = { .acpi_match_table = dwapb_acpi_match, }, .probe = dwapb_gpio_probe, - .remove = dwapb_gpio_remove, }; module_platform_driver(dwapb_gpio_driver); -- 2.27.0
[PATCH v3 03/10] gpio: dwapb: Move MFD-specific IRQ handler
For better readability let's group all the IRQ handlers in a single place of the driver instead of having them scatter around all over the file. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index ccd80df16062..3081213247d8 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -220,6 +220,11 @@ static void dwapb_irq_handler(struct irq_desc *desc) chained_irq_exit(chip, desc); } +static irqreturn_t dwapb_irq_handler_mfd(int irq, void *dev_id) +{ + return IRQ_RETVAL(dwapb_do_irq(dev_id)); +} + static void dwapb_irq_enable(struct irq_data *d) { struct irq_chip_generic *igc = irq_data_get_irq_chip_data(d); @@ -349,11 +354,6 @@ static int dwapb_gpio_set_config(struct gpio_chip *gc, unsigned offset, return dwapb_gpio_set_debounce(gc, offset, debounce); } -static irqreturn_t dwapb_irq_handler_mfd(int irq, void *dev_id) -{ - return IRQ_RETVAL(dwapb_do_irq(dev_id)); -} - static void dwapb_configure_irqs(struct dwapb_gpio *gpio, struct dwapb_gpio_port *port, struct dwapb_port_property *pp) -- 2.27.0
[PATCH v3 05/10] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
GPIO-lib provides a ready-to-use interface to initialize an IRQ-chip on top of a GPIO chip. It's better from maintainability and readability point of view to use one instead of supporting a hand-written Generic IRQ-chip-based implementation. Moreover the new implementation won't cause much functional overhead but will provide a cleaner driver code. All of that makes the DW APB GPIO driver conversion pretty much justified especially seeing a tendency of the other GPIO drivers getting converted too. Here is what we do in the framework of this commit to convert the driver to using the GPIO-lib-based IRQ-chip interface: 1) IRQ ack, mask and unmask callbacks are locally defined instead of using the Generic IRQ-chip ones. 2) An irq_chip structure instance is embedded into the dwapb_gpio private data. Note we can't have a static instance of that structure since GPIO-lib will add some hooks into it by calling gpiochip_set_irq_hooks(). A warning about that would have been printed by the GPIO-lib code if we used a single irq_chip structure instance for multiple DW APB GPIO controllers. 3) Initialize the gpio_irq_chip structure embedded into the gpio_chip descriptor. By default there is no IRQ enabled so any event raised will be handled by the handle_bad_irq() IRQ flow handler. If DW APB GPIO IP-core is synthesized to have non-shared reference IRQ-lines, then as before the hierarchical and cascaded cases are distinguished by checking how many parental IRQs are defined. (Note irq_set_chained_handler_and_data() won't initialize IRQs, which descriptors couldn't be found.) If DW APB GPIO IP is used on a platform with shared IRQ line, then we simply won't let the GPIO-lib to initialize the parental IRQs, but will handle them locally in the driver. 4) Discard linear IRQ-domain and Generic IRQ-chip initialization, since GPIO-lib IRQ-chip interface will create a new domain and accept a standard IRQ-chip structure pointer based on the setting we provided in the gpio_irq_chip structure. 5) Manually select a proper IRQ flow handler directly in the irq_set_type() callback by calling irq_set_handler_locked() method, since an ordinary (not Generic) irq_chip descriptor is now utilized. Note this shalln't give any regression 6) Alter CONFIG_GPIO_DWAPB kernel config to select CONFIG_GPIOLIB_IRQCHIP instead of CONFIG_GENERIC_IRQ_CHIP. Note neither 4) nor 5) shall cause a regression of commit 6a2f4b7dadd5 ("gpio: dwapb: use a second irq chip"), since the later isn't properly used here anyway. Signed-off-by: Serge Semin --- Note in this patch we omit the next GPIO-lib IRQ-chip settings initialization: gc->irq.map gc->irq.init_valid_mask That's intentionally done in order to keep the driver functionality at the same level as before the conversion: each GPIO line will be registered as the source of IRQs, no matter whether DW APB GPIO IP is configured to generate combined or multiple IRQ signals. If more thorough mapping is needed, the driver should be altered to detect the IRQ signaling mode (for instance by checking a dedicated DT-property). Then based on that it will be able to create a proper hardware GPIO-IRQ to/from parental IRQs mapping and valid hardware GPIO-IRQs mask. Changelog v2: - Replace gc->to_irq() with irq_find_mapping() method. - Refactor dwapb_irq_set_type() method to directly set IRQ flow handler instead of using a temporary variable. - Initialize GPIOlib IRQ-chip settings before calling request_irq() method. - Add a notice regarding regression of commit 6a2f4b7dadd5 ("gpio: dwapb: use a second irq chip"). - Move the acpi_gpiochip_{request,free}_interrupts() methods invocation removal to a dedicated commit. - Move GPIO-chip to_irq callback removal to a dedicated commit. - Introduce dwapb_convert_irqs() method to convert the sparse parental IRQs array into an array of linearly distributed IRQs correctly perceived by GPIO-lib. Changelog v3: - Add blank lines to the commit message to make it more readable. - Dynamically allocate memory for the IRQ-chip-related data. --- drivers/gpio/Kconfig | 2 +- drivers/gpio/gpio-dwapb.c | 203 +- 2 files changed, 111 insertions(+), 94 deletions(-) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index c6b5c65c8405..bcd3e541a52b 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -202,7 +202,7 @@ config GPIO_DAVINCI config GPIO_DWAPB tristate "Synopsys DesignWare APB GPIO driver" select GPIO_GENERIC - select GENERIC_IRQ_CHIP + select GPIOLIB_IRQCHIP help Say Y or M here to build support for the Synopsys DesignWare APB GPIO block. diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index f34001152850..eb3a6da453ff 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include
Re: [PATCH v2 05/10] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
On Thu, Jul 30, 2020 at 05:26:18PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 04:55:31PM +0300, Serge Semin wrote: > > GPIO-lib provides a ready-to-use interface to initialize an IRQ-chip on > > top of a GPIO chip. It's better from maintainability and readability > > point of view to use one instead of supporting a hand-written Generic > > IRQ-chip-based implementation. Moreover the new implementation won't > > cause much functional overhead but will provide a cleaner driver code. > > All of that makes the DW APB GPIO driver conversion pretty much justified > > especially seeing a tendency of the other GPIO drivers getting converted > > too. > > > > Here is what we do in the framework of this commit to convert the driver > > to using the GPIO-lib-based IRQ-chip interface: > > 1) IRQ ack, mask and unmask callbacks are locally defined instead of > > using the Generic IRQ-chip ones. > > Easy to read if you put blank lines in between of items. Ok. > > > 2) An irq_chip structure instance is embedded into the dwapb_gpio > > private data. Note we can't have a static instance of that structure since > > GPIO-lib will add some hooks into it by calling gpiochip_set_irq_hooks(). > > A warning about that would have been printed by the GPIO-lib code if we > > used a single irq_chip structure instance for multiple DW APB GPIO > > controllers. > > 3) Initialize the gpio_irq_chip structure embedded into the gpio_chip > > descriptor. By default there is no IRQ enabled so any event raised will be > > handled by the handle_bad_irq() IRQ flow handler. If DW APB GPIO IP-core > > is synthesized to have non-shared reference IRQ-lines, then as before the > > hierarchical and cascaded cases are distinguished by checking how many > > parental IRQs are defined. (Note irq_set_chained_handler_and_data() won't > > initialize IRQs, which descriptors couldn't be found.) If DW APB GPIO IP > > is used on a platform with shared IRQ line, then we simply won't let the > > GPIO-lib to initialize the parental IRQs, but will handle them locally in > > the driver. > > 4) Discard linear IRQ-domain and Generic IRQ-chip initialization, since > > GPIO-lib IRQ-chip interface will create a new domain and accept a standard > > IRQ-chip structure pointer based on the setting we provided in the > > gpio_irq_chip structure. > > 5) Manually select a proper IRQ flow handler directly in the > > irq_set_type() callback by calling irq_set_handler_locked() method, since > > an ordinary (not Generic) irq_chip descriptor is now utilized. Note this > > shalln't give any regression > > 6) Alter CONFIG_GPIO_DWAPB kernel config to select > > CONFIG_GPIOLIB_IRQCHIP instead of CONFIG_GENERIC_IRQ_CHIP. > > > > Note neither 4) nor 5) shall cause a regression of commit 6a2f4b7dadd5 > > ("gpio: dwapb: use a second irq chip"), since the later isn't properly > > used here anyway. > > ... > > > struct dwapb_gpio_port { > > struct gpio_chipgc; > > + unsigned intnr_irqs; > > + unsigned intirq[DWAPB_MAX_GPIOS]; > > + struct irq_chip irqchip; > > boolis_registered; > > struct dwapb_gpio *gpio; > > Isn't it too much wasted memory (imagine 4 port controller)? > > What if we have it in another structure and allocate dynamically? > > struct dwapb_gpio_port_irqchip { > struct irq_chip irqchip; > unsigned intnr_irqs; > unsigned intirq[DWAPB_MAX_GPIOS]; > }; Agree. I have to send a new revision of the series anyway. I'll do that shortly. -Sergey > > ... > struct dwapb_gpio_port_irqchip *pirq; > ... > > (I agree that IRQ chip is rather property of a port than controller) > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH v2 04/10] gpio: dwapb: Add max GPIOs macro
On Thu, Jul 30, 2020 at 05:05:26PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 04:55:30PM +0300, Serge Semin wrote: > > Add a new macro DWAPB_MAX_GPIOS which defines the maximum possible number > > of GPIO lines corresponding to the maximum DW APB GPIO controller port > > width. Use the new macro instead of number literal 32 where it's > > applicable. > > Since it's a modified version of what I sent earlier, perhaps Suggested-by? Oh, found it [PATCH v1 6/6] gpio: dwapb: Define magic number for IRQ and GPIO lines Sorry for missing your suggested-by tag. I'll add it if new revision is required. Otherwise, Linus, could you add it when you apply the series? -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
Re: [PATCH v2 00/10] gpio: dwapb: Refactor GPIO resources initialization
Wou, I've confused my SOB tag here. Linus, if no additional patchset revision is required, could you please replace it with: Signed-off-by: Serge Semin ? Alternatively I could resend the series with correct version of the tag. -Sergey On Thu, Jul 30, 2020 at 04:55:26PM +0300, Serge Semin wrote: > This series is about the DW APB GPIO device initialization procedure > cleaning up. First of all it has been discovered that having a > vendor-specific "snps,nr-gpios" property isn't only redundant but also > might be dangerous (see the commit log for details). Instead we suggest to > use the generic "ngpios" property to define a number of GPIOs each DW APB > GPIO controller port supports. Secondly seeing a tendency of the other > GPIO drivers getting converted to using the GPIO-lib-based IRQ-chip > interface this series provides a patch, which replaces the DW APB GPIO > driver Generic IRQ-chip implementation with the GPIO-lib IRQ-chip one. > Finally the DW APB GPIO device probe procedure is simplified by > converting the code to be using the device managed resources for the > reference clocks initialization, reset control assertion/de-assertion > and GPIO-chip registration. > > Some additional cleanups like replacing a number of GPIOs literal with a > corresponding macro and grouping the IRQ handlers up in a single place of > the driver are also introduced in this patchset. > > Link: > https://lore.kernel.org/linux-gpio/20200723013858.10766-1-sergey.se...@baikalelectronics.ru/ > Changelog v2: > - Replace gc->to_irq() with irq_find_mapping() method. > - Refactor dwapb_irq_set_type() method to directly set IRQ flow handler > instead of using a temporary variable. > - Initialize GPIOlib IRQ-chip settings before calling request_irq() > method. > - Add a notice regarding regression of commit 6a2f4b7dadd5 ("gpio: > dwapb: use a second irq chip"). > - Move the acpi_gpiochip_{request,free}_interrupts() methods invocation > removal to a dedicated patch. > - Move GPIO-chip to_irq callback removal to a dedicated patch. > - Add a patch which replaces a max number of GPIO literals with a macro. > - Introduce dwapb_convert_irqs() method to convert the sparse parental > IRQs array into an array of linearly distributed IRQs correctly > perceived by GPIO-lib. > > Signed-off-by: Serge Semin > Cc: Andy Shevchenko > Cc: Andy Shevchenko > Cc: Alexey Malahov > Cc: Pavel Parkhomenko > Cc: Rob Herring > Cc: linux-g...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > > Serge Semin (10): > dt-bindings: gpio: dwapb: Add ngpios property support > gpio: dwapb: Add ngpios DT-property support > gpio: dwapb: Move MFD-specific IRQ handler > gpio: dwapb: Add max GPIOs macro > gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip > gpio: dwapb: Discard GPIO-to-IRQ mapping function > gpio: dwapb: Discard ACPI GPIO-chip IRQs request > gpio: dwapb: Get reset control by means of resource managed interface > gpio: dwapb: Get clocks by means of resource managed interface > gpio: dwapb: Use resource managed GPIO-chip add data method > > .../bindings/gpio/snps,dw-apb-gpio.yaml | 6 + > drivers/gpio/Kconfig | 2 +- > drivers/gpio/gpio-dwapb.c | 340 +- > include/linux/platform_data/gpio-dwapb.h | 4 +- > 4 files changed, 178 insertions(+), 174 deletions(-) > > -- > 2.27.0 >
Re: [PATCH v2 04/10] gpio: dwapb: Add max GPIOs macro
On Thu, Jul 30, 2020 at 05:05:26PM +0300, Andy Shevchenko wrote: > On Thu, Jul 30, 2020 at 04:55:30PM +0300, Serge Semin wrote: > > Add a new macro DWAPB_MAX_GPIOS which defines the maximum possible number > > of GPIO lines corresponding to the maximum DW APB GPIO controller port > > width. Use the new macro instead of number literal 32 where it's > > applicable. > > Since it's a modified version of what I sent earlier, perhaps Suggested-by? Could you point out to the message with that change? I must have missed that...( -Sergey > > -- > With Best Regards, > Andy Shevchenko > >
[PATCH v2 03/10] gpio: dwapb: Move MFD-specific IRQ handler
For better readability let's group all the IRQ handlers in a single place of the driver instead of having them scatter around all over the file. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index ccd80df16062..3081213247d8 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -220,6 +220,11 @@ static void dwapb_irq_handler(struct irq_desc *desc) chained_irq_exit(chip, desc); } +static irqreturn_t dwapb_irq_handler_mfd(int irq, void *dev_id) +{ + return IRQ_RETVAL(dwapb_do_irq(dev_id)); +} + static void dwapb_irq_enable(struct irq_data *d) { struct irq_chip_generic *igc = irq_data_get_irq_chip_data(d); @@ -349,11 +354,6 @@ static int dwapb_gpio_set_config(struct gpio_chip *gc, unsigned offset, return dwapb_gpio_set_debounce(gc, offset, debounce); } -static irqreturn_t dwapb_irq_handler_mfd(int irq, void *dev_id) -{ - return IRQ_RETVAL(dwapb_do_irq(dev_id)); -} - static void dwapb_configure_irqs(struct dwapb_gpio *gpio, struct dwapb_gpio_port *port, struct dwapb_port_property *pp) -- 2.27.0
[PATCH v2 07/10] gpio: dwapb: Discard ACPI GPIO-chip IRQs request
Since GPIOlib-based IRQ-chip interface is now utilized there is no need in calling the methods acpi_gpiochip_{request,free}_interrupts() here. They will be called from gpiochip_add_irqchip()/gpiochip_irqchip_remove() anyway. Signed-off-by: Serge Semin --- Changelog v2: - This is a new patch detached from commit "gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip". --- drivers/gpio/gpio-dwapb.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index f7acc5abbf5c..226d9c2d9493 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -512,9 +512,6 @@ static int dwapb_gpio_add_port(struct dwapb_gpio *gpio, return err; } - /* Add GPIO-signaled ACPI event support */ - acpi_gpiochip_request_interrupts(>gc); - port->is_registered = true; return 0; @@ -530,7 +527,6 @@ static void dwapb_gpio_unregister(struct dwapb_gpio *gpio) if (!port->is_registered) continue; - acpi_gpiochip_free_interrupts(>gc); gpiochip_remove(>gc); } } -- 2.27.0
[PATCH v2 06/10] gpio: dwapb: Discard GPIO-to-IRQ mapping function
Since GPIOlib-based IRQ-chip interface is now utilized there is no need in setting up a custom GPIO-to-IRQ mapping method. GPIO-lib defines the standard mapping method - gpiochip_to_irq(), which will be used anyway no matter whether the custom to_irq callback is specified or not. Signed-off-by: Serge Semin --- Changelog v2: - This is a new patch detached from commit "gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip". --- drivers/gpio/gpio-dwapb.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 327333fbc750..f7acc5abbf5c 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -150,14 +150,6 @@ static inline void dwapb_write(struct dwapb_gpio *gpio, unsigned int offset, gc->write_reg(reg_base + gpio_reg_convert(gpio, offset), val); } -static int dwapb_gpio_to_irq(struct gpio_chip *gc, unsigned offset) -{ - struct dwapb_gpio_port *port = gpiochip_get_data(gc); - struct dwapb_gpio *gpio = port->gpio; - - return irq_find_mapping(gpio->domain, offset); -} - static struct dwapb_gpio_port *dwapb_offs_to_port(struct dwapb_gpio *gpio, unsigned int offs) { struct dwapb_gpio_port *port; @@ -466,8 +458,6 @@ static void dwapb_configure_irqs(struct dwapb_gpio *gpio, } girq->chip = >irqchip; - - port->gc.to_irq = dwapb_gpio_to_irq; } static int dwapb_gpio_add_port(struct dwapb_gpio *gpio, -- 2.27.0
[PATCH v2 04/10] gpio: dwapb: Add max GPIOs macro
Add a new macro DWAPB_MAX_GPIOS which defines the maximum possible number of GPIO lines corresponding to the maximum DW APB GPIO controller port width. Use the new macro instead of number literal 32 where it's applicable. Signed-off-by: Serge Semin --- drivers/gpio/gpio-dwapb.c| 8 include/linux/platform_data/gpio-dwapb.h | 4 +++- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 3081213247d8..f34001152850 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -162,7 +162,7 @@ static struct dwapb_gpio_port *dwapb_offs_to_port(struct dwapb_gpio *gpio, unsig for (i = 0; i < gpio->nr_ports; i++) { port = >ports[i]; - if (port->idx == offs / 32) + if (port->idx == offs / DWAPB_MAX_GPIOS) return port; } @@ -182,7 +182,7 @@ static void dwapb_toggle_trigger(struct dwapb_gpio *gpio, unsigned int offs) pol = dwapb_read(gpio, GPIO_INT_POLARITY); /* Just read the current value right out of the data register */ - val = gc->get(gc, offs % 32); + val = gc->get(gc, offs % DWAPB_MAX_GPIOS); if (val) pol &= ~BIT(offs); else @@ -197,7 +197,7 @@ static u32 dwapb_do_irq(struct dwapb_gpio *gpio) irq_hw_number_t hwirq; irq_status = dwapb_read(gpio, GPIO_INTSTATUS); - for_each_set_bit(hwirq, _status, 32) { + for_each_set_bit(hwirq, _status, DWAPB_MAX_GPIOS) { int gpio_irq = irq_find_mapping(gpio->domain, hwirq); u32 irq_type = irq_get_trigger_type(gpio_irq); @@ -599,7 +599,7 @@ static struct dwapb_platform_data *dwapb_gpio_get_pdata(struct device *dev) dev_info(dev, "failed to get number of gpios for port%d\n", i); - pp->ngpio = 32; + pp->ngpio = DWAPB_MAX_GPIOS; } pp->irq_shared = false; diff --git a/include/linux/platform_data/gpio-dwapb.h b/include/linux/platform_data/gpio-dwapb.h index ff1be737bad6..0aa5c6720259 100644 --- a/include/linux/platform_data/gpio-dwapb.h +++ b/include/linux/platform_data/gpio-dwapb.h @@ -6,12 +6,14 @@ #ifndef GPIO_DW_APB_H #define GPIO_DW_APB_H +#define DWAPB_MAX_GPIOS32 + struct dwapb_port_property { struct fwnode_handle *fwnode; unsigned intidx; unsigned intngpio; unsigned intgpio_base; - int irq[32]; + int irq[DWAPB_MAX_GPIOS]; boolirq_shared; }; -- 2.27.0
[PATCH v2 01/10] dt-bindings: gpio: dwapb: Add ngpios property support
It's redundant to have a vendor-specific property describing a number of GPIOS while there is a generic one. Let's mark the former one as deprecated and define the "ngpios" property supported with constraints of being within [1; 32] range. Signed-off-by: Serge Semin Reviewed-by: Rob Herring --- .../devicetree/bindings/gpio/snps,dw-apb-gpio.yaml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml b/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml index 1240f6289249..b391cc1b4590 100644 --- a/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml +++ b/Documentation/devicetree/bindings/gpio/snps,dw-apb-gpio.yaml @@ -61,8 +61,14 @@ patternProperties: '#gpio-cells': const: 2 + ngpios: +default: 32 +minimum: 1 +maximum: 32 + snps,nr-gpios: description: The number of GPIO pins exported by the port. +deprecated: true $ref: /schemas/types.yaml#/definitions/uint32 default: 32 minimum: 1 -- 2.27.0
[PATCH v2 10/10] gpio: dwapb: Use resource managed GPIO-chip add data method
Since the resource managed version of gpiochip_add_data() will handle the GPIO-chip data automated cleanup we can freely remove the DW APB GPIO driver code responsible for that. After doing so the DW APB GPIO driver removal callback can be also fully discarded since there is nothing left to be done for it. All the cleanups are now performed by means of the device managed framework. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 37 ++--- 1 file changed, 2 insertions(+), 35 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 9b3065812de2..d3d555b3d492 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -87,7 +87,6 @@ struct dwapb_gpio_port { unsigned intnr_irqs; unsigned intirq[DWAPB_MAX_GPIOS]; struct irq_chip irqchip; - boolis_registered; struct dwapb_gpio *gpio; #ifdef CONFIG_PM_SLEEP struct dwapb_context*ctx; @@ -505,32 +504,16 @@ static int dwapb_gpio_add_port(struct dwapb_gpio *gpio, if (pp->idx == 0) dwapb_configure_irqs(gpio, port, pp); - err = gpiochip_add_data(>gc, port); + err = devm_gpiochip_add_data(gpio->dev, >gc, port); if (err) { dev_err(gpio->dev, "failed to register gpiochip for port%d\n", port->idx); return err; } - port->is_registered = true; - return 0; } -static void dwapb_gpio_unregister(struct dwapb_gpio *gpio) -{ - unsigned int m; - - for (m = 0; m < gpio->nr_ports; ++m) { - struct dwapb_gpio_port *port = >ports[m]; - - if (!port->is_registered) - continue; - - gpiochip_remove(>gc); - } -} - static void dwapb_get_irq(struct device *dev, struct fwnode_handle *fwnode, struct dwapb_port_property *pp) { @@ -724,23 +707,8 @@ static int dwapb_gpio_probe(struct platform_device *pdev) for (i = 0; i < gpio->nr_ports; i++) { err = dwapb_gpio_add_port(gpio, >properties[i], i); if (err) - goto out_unregister; + return err; } - platform_set_drvdata(pdev, gpio); - - return 0; - -out_unregister: - dwapb_gpio_unregister(gpio); - - return err; -} - -static int dwapb_gpio_remove(struct platform_device *pdev) -{ - struct dwapb_gpio *gpio = platform_get_drvdata(pdev); - - dwapb_gpio_unregister(gpio); return 0; } @@ -844,7 +812,6 @@ static struct platform_driver dwapb_gpio_driver = { .acpi_match_table = dwapb_acpi_match, }, .probe = dwapb_gpio_probe, - .remove = dwapb_gpio_remove, }; module_platform_driver(dwapb_gpio_driver); -- 2.27.0
[PATCH v2 09/10] gpio: dwapb: Get clocks by means of resource managed interface
The kernel clock framework provides the resource managed version of the clk_bulk_get() method. The only thing which needs to be also automated is the clocks disable/unprepare procedure executed on the device removal. It can be implemented by means of the custom action definition. After that the clocks acquisition and release will be purely managed by the device resources interface. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 48 ++- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 9aa681c79c82..9b3065812de2 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -633,6 +633,36 @@ static int dwapb_get_reset(struct dwapb_gpio *gpio) return devm_add_action_or_reset(gpio->dev, dwapb_assert_reset, gpio); } +static void dwapb_disable_clks(void *data) +{ + struct dwapb_gpio *gpio = data; + + clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); +} + +static int dwapb_get_clks(struct dwapb_gpio *gpio) +{ + int err; + + /* Optional bus and debounce clocks */ + gpio->clks[0].id = "bus"; + gpio->clks[1].id = "db"; + err = devm_clk_bulk_get_optional(gpio->dev, DWAPB_NR_CLOCKS, +gpio->clks); + if (err) { + dev_err(gpio->dev, "Cannot get APB/Debounce clocks\n"); + return err; + } + + err = clk_bulk_prepare_enable(DWAPB_NR_CLOCKS, gpio->clks); + if (err) { + dev_err(gpio->dev, "Cannot enable APB/Debounce clocks\n"); + return err; + } + + return devm_add_action_or_reset(gpio->dev, dwapb_disable_clks, gpio); +} + static const struct of_device_id dwapb_of_match[] = { { .compatible = "snps,dw-apb-gpio", .data = (void *)0}, { .compatible = "apm,xgene-gpio-v2", .data = (void *)GPIO_REG_OFFSET_V2}, @@ -685,21 +715,9 @@ static int dwapb_gpio_probe(struct platform_device *pdev) if (IS_ERR(gpio->regs)) return PTR_ERR(gpio->regs); - /* Optional bus and debounce clocks */ - gpio->clks[0].id = "bus"; - gpio->clks[1].id = "db"; - err = devm_clk_bulk_get_optional(>dev, DWAPB_NR_CLOCKS, -gpio->clks); - if (err) { - dev_err(>dev, "Cannot get APB/Debounce clocks\n"); - return err; - } - - err = clk_bulk_prepare_enable(DWAPB_NR_CLOCKS, gpio->clks); - if (err) { - dev_err(>dev, "Cannot enable APB/Debounce clocks\n"); + err = dwapb_get_clks(gpio); + if (err) return err; - } gpio->flags = (uintptr_t)device_get_match_data(dev); @@ -714,7 +732,6 @@ static int dwapb_gpio_probe(struct platform_device *pdev) out_unregister: dwapb_gpio_unregister(gpio); - clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); return err; } @@ -724,7 +741,6 @@ static int dwapb_gpio_remove(struct platform_device *pdev) struct dwapb_gpio *gpio = platform_get_drvdata(pdev); dwapb_gpio_unregister(gpio); - clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); return 0; } -- 2.27.0
[PATCH v2 05/10] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
GPIO-lib provides a ready-to-use interface to initialize an IRQ-chip on top of a GPIO chip. It's better from maintainability and readability point of view to use one instead of supporting a hand-written Generic IRQ-chip-based implementation. Moreover the new implementation won't cause much functional overhead but will provide a cleaner driver code. All of that makes the DW APB GPIO driver conversion pretty much justified especially seeing a tendency of the other GPIO drivers getting converted too. Here is what we do in the framework of this commit to convert the driver to using the GPIO-lib-based IRQ-chip interface: 1) IRQ ack, mask and unmask callbacks are locally defined instead of using the Generic IRQ-chip ones. 2) An irq_chip structure instance is embedded into the dwapb_gpio private data. Note we can't have a static instance of that structure since GPIO-lib will add some hooks into it by calling gpiochip_set_irq_hooks(). A warning about that would have been printed by the GPIO-lib code if we used a single irq_chip structure instance for multiple DW APB GPIO controllers. 3) Initialize the gpio_irq_chip structure embedded into the gpio_chip descriptor. By default there is no IRQ enabled so any event raised will be handled by the handle_bad_irq() IRQ flow handler. If DW APB GPIO IP-core is synthesized to have non-shared reference IRQ-lines, then as before the hierarchical and cascaded cases are distinguished by checking how many parental IRQs are defined. (Note irq_set_chained_handler_and_data() won't initialize IRQs, which descriptors couldn't be found.) If DW APB GPIO IP is used on a platform with shared IRQ line, then we simply won't let the GPIO-lib to initialize the parental IRQs, but will handle them locally in the driver. 4) Discard linear IRQ-domain and Generic IRQ-chip initialization, since GPIO-lib IRQ-chip interface will create a new domain and accept a standard IRQ-chip structure pointer based on the setting we provided in the gpio_irq_chip structure. 5) Manually select a proper IRQ flow handler directly in the irq_set_type() callback by calling irq_set_handler_locked() method, since an ordinary (not Generic) irq_chip descriptor is now utilized. Note this shalln't give any regression 6) Alter CONFIG_GPIO_DWAPB kernel config to select CONFIG_GPIOLIB_IRQCHIP instead of CONFIG_GENERIC_IRQ_CHIP. Note neither 4) nor 5) shall cause a regression of commit 6a2f4b7dadd5 ("gpio: dwapb: use a second irq chip"), since the later isn't properly used here anyway. Signed-off-by: Serge Semin --- Note in this patch we omit the next GPIO-lib IRQ-chip settings initialization: gc->irq.map gc->irq.init_valid_mask That's intentionally done in order to keep the driver functionality at the same level as before the conversion: each GPIO line will be registered as the source of IRQs, no matter whether DW APB GPIO IP is configured to generate combined or multiple IRQ signals. If more thorough mapping is needed, the driver should be altered to detect the IRQ signaling mode (for instance by checking a dedicated DT-property). Then based on that it will be able to create a proper hardware GPIO-IRQ to/from parental IRQs mapping and valid hardware GPIO-IRQs mask. Changelog v2: - Replace gc->to_irq() with irq_find_mapping() method. - Refactor dwapb_irq_set_type() method to directly set IRQ flow handler instead of using a temporary variable. - Initialize GPIOlib IRQ-chip settings before calling request_irq() method. - Add a notice regarding regression of commit 6a2f4b7dadd5 ("gpio: dwapb: use a second irq chip"). - Move the acpi_gpiochip_{request,free}_interrupts() methods invocation removal to a dedicated commit. - Move GPIO-chip to_irq callback removal to a dedicated commit. - Introduce dwapb_convert_irqs() method to convert the sparse parental IRQs array into an array of linearly distributed IRQs correctly perceived by GPIO-lib. --- drivers/gpio/Kconfig | 2 +- drivers/gpio/gpio-dwapb.c | 191 +++--- 2 files changed, 98 insertions(+), 95 deletions(-) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index c6b5c65c8405..bcd3e541a52b 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -202,7 +202,7 @@ config GPIO_DAVINCI config GPIO_DWAPB tristate "Synopsys DesignWare APB GPIO driver" select GPIO_GENERIC - select GENERIC_IRQ_CHIP + select GPIOLIB_IRQCHIP help Say Y or M here to build support for the Synopsys DesignWare APB GPIO block. diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index f34001152850..327333fbc750 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include @@ -85,6 +84,9 @@ struct dwapb_context { struct dwapb_gpio_port { struct gpio_chipgc; + unsigned intnr_irqs; + unsi
[PATCH v2 08/10] gpio: dwapb: Get reset control by means of resource managed interface
The reset control interface provides the resource managed version of the reset_control_get() method. The only thing which needs to be also automated is the reset lane assertion on the device removal. It can be implemented by means of the custom action definition. After that the reset control will be purely managed by the device resources interface. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 35 +-- 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 226d9c2d9493..9aa681c79c82 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -607,6 +607,32 @@ static struct dwapb_platform_data *dwapb_gpio_get_pdata(struct device *dev) return pdata; } +static void dwapb_assert_reset(void *data) +{ + struct dwapb_gpio *gpio = data; + + reset_control_assert(gpio->rst); +} + +static int dwapb_get_reset(struct dwapb_gpio *gpio) +{ + int err; + + gpio->rst = devm_reset_control_get_optional_shared(gpio->dev, NULL); + if (IS_ERR(gpio->rst)) { + dev_err(gpio->dev, "Cannot get reset descriptor\n"); + return PTR_ERR(gpio->rst); + } + + err = reset_control_deassert(gpio->rst); + if (err) { + dev_err(gpio->dev, "Cannot deassert reset lane\n"); + return err; + } + + return devm_add_action_or_reset(gpio->dev, dwapb_assert_reset, gpio); +} + static const struct of_device_id dwapb_of_match[] = { { .compatible = "snps,dw-apb-gpio", .data = (void *)0}, { .compatible = "apm,xgene-gpio-v2", .data = (void *)GPIO_REG_OFFSET_V2}, @@ -646,11 +672,9 @@ static int dwapb_gpio_probe(struct platform_device *pdev) gpio->dev = >dev; gpio->nr_ports = pdata->nports; - gpio->rst = devm_reset_control_get_optional_shared(dev, NULL); - if (IS_ERR(gpio->rst)) - return PTR_ERR(gpio->rst); - - reset_control_deassert(gpio->rst); + err = dwapb_get_reset(gpio); + if (err) + return err; gpio->ports = devm_kcalloc(>dev, gpio->nr_ports, sizeof(*gpio->ports), GFP_KERNEL); @@ -700,7 +724,6 @@ static int dwapb_gpio_remove(struct platform_device *pdev) struct dwapb_gpio *gpio = platform_get_drvdata(pdev); dwapb_gpio_unregister(gpio); - reset_control_assert(gpio->rst); clk_bulk_disable_unprepare(DWAPB_NR_CLOCKS, gpio->clks); return 0; -- 2.27.0
[PATCH v2 00/10] gpio: dwapb: Refactor GPIO resources initialization
This series is about the DW APB GPIO device initialization procedure cleaning up. First of all it has been discovered that having a vendor-specific "snps,nr-gpios" property isn't only redundant but also might be dangerous (see the commit log for details). Instead we suggest to use the generic "ngpios" property to define a number of GPIOs each DW APB GPIO controller port supports. Secondly seeing a tendency of the other GPIO drivers getting converted to using the GPIO-lib-based IRQ-chip interface this series provides a patch, which replaces the DW APB GPIO driver Generic IRQ-chip implementation with the GPIO-lib IRQ-chip one. Finally the DW APB GPIO device probe procedure is simplified by converting the code to be using the device managed resources for the reference clocks initialization, reset control assertion/de-assertion and GPIO-chip registration. Some additional cleanups like replacing a number of GPIOs literal with a corresponding macro and grouping the IRQ handlers up in a single place of the driver are also introduced in this patchset. Link: https://lore.kernel.org/linux-gpio/20200723013858.10766-1-sergey.se...@baikalelectronics.ru/ Changelog v2: - Replace gc->to_irq() with irq_find_mapping() method. - Refactor dwapb_irq_set_type() method to directly set IRQ flow handler instead of using a temporary variable. - Initialize GPIOlib IRQ-chip settings before calling request_irq() method. - Add a notice regarding regression of commit 6a2f4b7dadd5 ("gpio: dwapb: use a second irq chip"). - Move the acpi_gpiochip_{request,free}_interrupts() methods invocation removal to a dedicated patch. - Move GPIO-chip to_irq callback removal to a dedicated patch. - Add a patch which replaces a max number of GPIO literals with a macro. - Introduce dwapb_convert_irqs() method to convert the sparse parental IRQs array into an array of linearly distributed IRQs correctly perceived by GPIO-lib. Signed-off-by: Serge Semin Cc: Andy Shevchenko Cc: Andy Shevchenko Cc: Alexey Malahov Cc: Pavel Parkhomenko Cc: Rob Herring Cc: linux-g...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Serge Semin (10): dt-bindings: gpio: dwapb: Add ngpios property support gpio: dwapb: Add ngpios DT-property support gpio: dwapb: Move MFD-specific IRQ handler gpio: dwapb: Add max GPIOs macro gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip gpio: dwapb: Discard GPIO-to-IRQ mapping function gpio: dwapb: Discard ACPI GPIO-chip IRQs request gpio: dwapb: Get reset control by means of resource managed interface gpio: dwapb: Get clocks by means of resource managed interface gpio: dwapb: Use resource managed GPIO-chip add data method .../bindings/gpio/snps,dw-apb-gpio.yaml | 6 + drivers/gpio/Kconfig | 2 +- drivers/gpio/gpio-dwapb.c | 340 +- include/linux/platform_data/gpio-dwapb.h | 4 +- 4 files changed, 178 insertions(+), 174 deletions(-) -- 2.27.0
[PATCH v2 02/10] gpio: dwapb: Add ngpios DT-property support
Indeed generic GPIO DT-schema implies that number of GPIOs should be described by the "ngpios" property located under a GPIO-provider DT node. In that case it's redundant to have a vendor-specific "snps,nr-gpios" property describing the same setting. Moreover it might be errors prone. Since commit 93d2e4322aa7 ("of: platform: Batch fwnode parsing when adding all top level devices") the fwnode parsing is resumed after the vast majority of the platform devices are added. Implicitly that commit activates re-parsing of the whole device tree GPIOs-phandle properties detected having "-gpio/-gpios" suffixes. Since the vendor-specific number of GPIOs property is defined with "-gpios" suffix, then of_link_property() will consider it as a suffix-property with "#gpio-cells" structure, which obviously it doesn't have. As a result for two DW APB GPIO controllers we'll have the next errors printed. OF: /bus@1f059000/gpio@1f044000/gpio-port@0: could not find phandle OF: /bus@1f059000/gpio@1f045000/gpio-port@0: could not get #gpio-cells for /opp-table OF: /bus@1f059000/gpio@1f044000/gpio-port@0: could not find phandle OF: /bus@1f059000/gpio@1f045000/gpio-port@0: could not get #gpio-cells for /opp-table See, the kernel fwnode parsing procedure even tried to resolve the phandle ID, which it thought was the opp-table DT-node, while in fact it was just a number "32". What would happen if that magic number actually referred to a GPIO DT-node with "#gpio-cells" property defined?.. In order to fix the problem let's mark the "snps,nr-gpios" property as deprecated and add the generic "ngpios" property support with the same purpose as the deprecated one. That and the errors log above shall motivate the platform developer to convert the DW APB GPIO DT-nodes to using the standard number of GPIOs property. Signed-off-by: Serge Semin Reviewed-by: Andy Shevchenko --- drivers/gpio/gpio-dwapb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c index 1d8d55bd63aa..ccd80df16062 100644 --- a/drivers/gpio/gpio-dwapb.c +++ b/drivers/gpio/gpio-dwapb.c @@ -594,7 +594,8 @@ static struct dwapb_platform_data *dwapb_gpio_get_pdata(struct device *dev) return ERR_PTR(-EINVAL); } - if (fwnode_property_read_u32(fwnode, "snps,nr-gpios", >ngpio)) { + if (fwnode_property_read_u32(fwnode, "ngpios", >ngpio) && + fwnode_property_read_u32(fwnode, "snps,nr-gpios", >ngpio)) { dev_info(dev, "failed to get number of gpios for port%d\n", i); -- 2.27.0
Re: [PATCH 4/7] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
On Wed, Jul 29, 2020 at 06:10:24PM +0300, Andy Shevchenko wrote: > On Wed, Jul 29, 2020 at 3:58 PM Serge Semin > wrote: > > On Mon, Jul 27, 2020 at 12:22:28AM +0200, Linus Walleij wrote: > > ... > > > Sorry for a delay with a response to this issue. I had to give it a more > > thorough > > thought since the problem is a bit more complex than it seemed originally. > > As I > > see it now It might be wrong to implement the cases 2) and 3), but 1) is > > more > > appropriate. > > > > First of all we need to note that GPIOlib framework provides the next > > parameters > > to describe the IRQ-chip: > > gc->irq.num_parents - number of parental IRQ numbers. > > gc->irq.parents[] - array of parental IRQ numbers. > > *gc->irq.valid_mask - a mask of IRQ/GPIO lines describing a valid IRQ. > > *gc->irq.map - mapping of hw IRQ/GPIO ID -> parental IRQ numbers. > > > > Using that set we can handle any case of linear and sparse parental IRQs. > > Here > > is how it can be implemented in the framework of DW APB GPIO controller. > > > > DW APB GPIO can be synthesized with two configs: > > 1) Combined IRQ line (GPIO_INTR_IO == True), > > 2) Multiple interrupt signals for each GPIO (GPIO_INTR_IO == False). > > > > Obviously the former case is trivial: > > > > IRQ_combined > > __^ > >/_ _ _ _ _ ___ _\ > >|_|_|_|_|_|...|_| - GPIOs > > > > In that case > > gc->irq.num_parents = 1; > > gc->irq.parents[0] = IRQ_combined; > > *gc->irq.valid_mask = GENMASK(ngpio - 1, 0); // This is done by the GPIOlib > > core itself. > > > > The later one (when multiple interrupt signals are involved) can be a bit > > more > > complicated. It can be also split up into two cases: > > 2a) One-on-one GPIO-IRQ mapping. > > 2b) Sparse GPIO-IRQ mapping. > > > > It's straightforward to implement 2a): > > > >i1i2i3i4i5 ... iN > > _ _ _ _ _ ___ _ > >|_|_|_|_|_|...|_| - GPIOs > > > > In that case > > gc->irq.num_parents = ngpio; > > gc->irq.parents[] = {i1, i2, i3, i4, i5, ... iN}; > > gc->irq.map = {i1, i2, i3, i4, i5, ... iN}; > > *gc->irq.valid_mask = GENMASK(ngpio - 1, 0); > > > > This case puzzles me. Why is it not NULL and 0 and actually you handle > everything as a nested case? The code provided above is a sketch. Don't consider it literally. In reality of course valid_mask will be not NULL, memory for which will be allocated by the GPIOlib itself. The mask will be initialized by means of the gc->irq.init_valid_mask() callback. > > > The complication starts when we get to implementing 2b): > > > >i1 xi3i4 x ... iN > > _ _ _ _ _ ___ _ > >|_|_|_|_|_|...|_| - GPIOs > > So does this. > > Valid mask will define exactly GPIOs that are IRQs. So, we will handle > only nested IRQs which are valid. Right. > > > In order to cover this case we need to answer on two question. > > Firstly how to get such platform config? I am not sure about ACPI, but > > aside from straightforward platform_data-based setup such configuration > > can be reached by setting up the "interrupts-extended" DT-property with > > zeroed phandle. > > > > Ok, since it's possible to meet such platform config, we need to think > > how to handle it and here is the second question. How to describe such > > case in the framework of GPIOlib-IRQchip? > > > > So from my side it was wrong to set the sparse IRQs array to > > gc->irq.parents. Instead I should have scanned the sparse IRQs array, > > calculated the number of non-empty parental IRQs, created an array of linear > > (non-sparse) IRQs, initialized *gc->irq.valid_mask in accordance with the > > sparse parental IRQs array. In other words it was wrong to assume, that > > each gc->irq.parents entry corresponds to the IRQ/GPIO line. The > > gc->irq.parents > > array just describes the parental IRQs and nothing else. > > > > Shortly speaking here is how the GPIOlib IRQchip parameters should be > > initialized in this case: > > gc->irq.num_parents - number of valid parental IRQs. > > gc->irq.parents - non-sparse, linear array of valid IRQs. > > *gc->irq.valid_mask - mask initialized by means of the > > gc->irq.init_valid_mask() > > callback, which indicates valid IRQ/GPIO IDs. > > *gc->irq.map - sparse array of parental IRQ numbers (which I mistakenly > > tried to > > pass through the gc->irq.parents pointer). > > >
Re: [PATCH 4/7] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
On Mon, Jul 27, 2020 at 12:22:28AM +0200, Linus Walleij wrote: > On Sat, Jul 25, 2020 at 1:03 AM Serge Semin > wrote: > > > According to the DW APB GPIO databook it can be configured to provide > > either a > > combined IRQ line or multiple interrupt signals for each GPIO. It's up to > > the platform which of those signals are connected to an embedded IRQ > > controller. So I guess theoretically the array values can be sparse. > > > > Anyway now I see it's rather problematic. I didn't forget about the sparse > > IRQs > > array case. I just thought it would work out-of-box. Before getting your > > comment > > and digging deeper into the IRQ subsystem I had thought that it wasn't a > > problem > > passing invalid IRQ numbers to the irq_set_chained_handler_and_data() > > especially > > seeing zero IRQ number was supposed to be considered as invalid. That > > method shall > > just ignore the invalid IRQs since the method irq_to_desc() calling > > radix_tree_lookup() > > will fail to find a descriptor with invalid IRQ value and return NULL. So > > after > > getting a NULL irq_desc the method irq_set_chained_handler_and_data() would > > have stopped setting the handler. But turns out it may work only for > > CONFIG_SPARSE_IRQ. If that config isn't enabled, then a very first IRQ > > descriptor will be returned for zero IRQ number. That descriptor will be > > initialized with the passed parent_handler callback, which isn't what we > > want. > > Ouch but different beahviour on the outside of the irqchip API depending > on whether IRQs are sparse or not on some particular system seems to > be a problem with irqchip reallty, if we wanna get to the bottom of things. > (paging Marc) > > > So in order to fix the problem we could follow either of the next paths: > > 1) Just make sure the passed IRQs array is not sparse for instance by > > remapping > >it to be linear. > > 2) Move "if (gc->irq.parents[i]) irq_set_chained_handler_and_data()" > > statement to the > >gpiochip_add_irqchip() method. > > > > What to you think? Linus? > > What about (3) fixing irqchip? > > Else (2), making the code inside gpiolib be careful and skip over > invalid IRQs. Sorry for a delay with a response to this issue. I had to give it a more thorough thought since the problem is a bit more complex than it seemed originally. As I see it now It might be wrong to implement the cases 2) and 3), but 1) is more appropriate. First of all we need to note that GPIOlib framework provides the next parameters to describe the IRQ-chip: gc->irq.num_parents - number of parental IRQ numbers. gc->irq.parents[] - array of parental IRQ numbers. *gc->irq.valid_mask - a mask of IRQ/GPIO lines describing a valid IRQ. *gc->irq.map - mapping of hw IRQ/GPIO ID -> parental IRQ numbers. Using that set we can handle any case of linear and sparse parental IRQs. Here is how it can be implemented in the framework of DW APB GPIO controller. DW APB GPIO can be synthesized with two configs: 1) Combined IRQ line (GPIO_INTR_IO == True), 2) Multiple interrupt signals for each GPIO (GPIO_INTR_IO == False). Obviously the former case is trivial: IRQ_combined __^ /_ _ _ _ _ ___ _\ |_|_|_|_|_|...|_| - GPIOs In that case gc->irq.num_parents = 1; gc->irq.parents[0] = IRQ_combined; *gc->irq.valid_mask = GENMASK(ngpio - 1, 0); // This is done by the GPIOlib core itself. The later one (when multiple interrupt signals are involved) can be a bit more complicated. It can be also split up into two cases: 2a) One-on-one GPIO-IRQ mapping. 2b) Sparse GPIO-IRQ mapping. It's straightforward to implement 2a): i1i2i3i4i5 ... iN _ _ _ _ _ ___ _ |_|_|_|_|_|...|_| - GPIOs In that case gc->irq.num_parents = ngpio; gc->irq.parents[] = {i1, i2, i3, i4, i5, ... iN}; gc->irq.map = {i1, i2, i3, i4, i5, ... iN}; *gc->irq.valid_mask = GENMASK(ngpio - 1, 0); The complication starts when we get to implementing 2b): i1 xi3i4 x ... iN _ _ _ _ _ ___ _ |_|_|_|_|_|...|_| - GPIOs In order to cover this case we need to answer on two question. Firstly how to get such platform config? I am not sure about ACPI, but aside from straightforward platform_data-based setup such configuration can be reached by setting up the "interrupts-extended" DT-property with zeroed phandle. Ok, since it's possible to meet such platform config, we need to think how to handle it and here is the second question. How to describe such case in the framework of GPIOlib-IRQchip? So from my side it was wrong to set the sparse IRQs array to gc->irq.parents. Instead I should have scanned the sparse IRQs array, calculated the number of non-empty parental I
Re: [PATCH 4/7] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
On Sat, Jul 25, 2020 at 03:12:49PM +0300, Andy Shevchenko wrote: > On Sat, Jul 25, 2020 at 2:03 AM Serge Semin > wrote: > > On Thu, Jul 23, 2020 at 01:03:17PM +0300, Andy Shevchenko wrote: > > > On Thu, Jul 23, 2020 at 04:38:55AM +0300, Serge Semin wrote: ... > > > > + /* This will let us handle the parent IRQ in the driver */ > > > > + girq->parents = NULL; > > > > + girq->num_parents = 0; > > > > + girq->parent_handler = NULL; > > > > Shan't we do this before request_irq() call (at least for consistency > > > with the > > > rest of the drivers)? > > > > Technically we shan't. Please elaborate which drivers you are referring to? > > All of them? Recent patches for IRQ chip template do something like > > girq = &...; > girq->foo = bar; > ... > ret = request_irq(...); > > ...and here no more girq->baz = gaz; lines. > > > Even the recent Linus' series "Use irqchip template" mostly does it in the > > same order. > > Funny, that's what I;m referring to. It turns out my "mostly" was wrong in this matter. It's 4 out of 17 patches, which make the initialization in the same order as mine: drivers/gpio/gpio-max732x.c drivers/gpio/gpio-pca953x.c drivers/gpio/gpio-pcf857x.c drivers/gpio/gpio-adp5588.c while the rest of them does it in the order suggested by you: drivers/gpio/gpio-pci-idio-16.c drivers/gpio/gpio-pcie-idio-24.c drivers/gpio/gpio-104-idio-16.c drivers/gpio/gpio-104-dio-48e.c drivers/gpio/gpio-ws16c48.c drivers/gpio/gpio-rcar.c drivers/gpio/gpio-wcove.c drivers/pinctrl/pinctrl-amd.c drivers/gpio/gpio-crystalcove.c drivers/pinctrl/pinctrl-mcp23s08.c drivers/pinctrl/pinctrl-sx150x.c drivers/pinctrl/pinctrl-stmfx.c drivers/gpio/gpio-tc3589x.c Then, let's use the same order here as the most of the drivers do just for consistency. -Sergey > > -- > With Best Regards, > Andy Shevchenko
Re: [PATCH v8 00/10] dmaengine: dw: Take Baikal-T1 SoC DW DMAC peculiarities into account
On Mon, Jul 27, 2020 at 02:31:14PM +0530, Vinod Koul wrote: > On 23-07-20, 03:58, Serge Semin wrote: > > In the previous patchset I've written the next message: > > > > > Folks, note I've removed the next patches from the series: > > > [PATCH v7 04/11] dmaengine: Introduce max SG list entries capability > > > [PATCH v7 11/11] dmaengine: dw: Initialize max_sg_nents capability > > > It turns out the problem with the asynchronous handling of Tx- and Rx- > > > SPI transfers interrupts is more complex than I expected. So in order to > > > solve the problem it isn't enough to split the SG list entries submission > > > up based on the max_sg_nents capability setting (though the synchronous > > > one-by-one SG list entries handling does fix a part of the problem). So > > > if and when I get to find a comprehensive solution for it I'll submit a > > > new series with fixups. Until then please consider to merge the patchset > > > in without those patches. > > > > Those patches are returned back to the series. I've found a solution, which > > fixes the problem for our hardware. A new patchset with several fixes for > > the > > DW DMAC driver will be sent shortly after this one is merged in. Note the > > same > > concerns the DW APB SPI driver. So please review and merge in as soon as > > possible. > > > > Regarding the patchset. Baikal-T1 SoC has an DW DMAC on-board to provide a > > Mem-to-Mem, low-speed peripherals Dev-to-Mem and Mem-to-Dev functionality. > > Mostly it's compatible with currently implemented in the kernel DW DMAC > > driver, but there are some peculiarities which must be taken into account > > in order to have the device fully supported. > > > > First of all traditionally we replaced the legacy plain text-based > > dt-binding > > file with yaml-based one. Secondly Baikal-T1 DW DMA Controller provides > > eight > > channels, which alas have different max burst length configuration. > > In particular first two channels may burst up to 128 bits (16 bytes) at a > > time > > while the rest of them just up to 32 bits. We must make sure that the DMA > > subsystem doesn't set values exceeding these limitations otherwise the > > controller will hang up. In third currently we discovered the problem in > > using > > the DW APB SPI driver together with DW DMAC. The problem happens if there > > is no > > natively implemented multi-block LLP transfers support and the SPI-transfer > > length exceeds the max lock size. In this case due to asynchronous handling > > of > > Tx- and Rx- SPI transfers interrupt we might end up with DW APB SSI Rx FIFO > > overflow. So if DW APB SSI (or any other DMAC service consumer) intends to > > use > > the DMAC to asynchronously execute the transfers we'd have to at least warn > > the user of the possible errors. In forth it's worth to set the DMA device > > max > > segment size with max block size config specific to the DW DMA controller. > > It > > shall help the DMA clients to create size-optimized SG-list items for the > > controller. This in turn will cause less dw_desc allocations, less LLP > > reinitializations, better DMA device performance. > > > > Finally there is a bug in the algorithm of the nollp flag detection. > > In particular even if DW DMAC parameters state the multi-block transfers > > support there is still HC_LLP (hardcode LLP) flag, which if set makes > > expected > > by the driver true multi-block LLP functionality unusable. This happens > > cause' > > if HC_LLP flag is set the LLP registers will be hardcoded to zero so the > > contiguous multi-block transfers will be only supported. We must take the > > flag into account when detecting the LLP support otherwise the driver just > > won't work correctly. > > Applied all, thanks Great! Thank you very much. Now I can send out another set of patches for review. -Sergey > -- > ~Vinod
Re: [PATCH 4/7] gpio: dwapb: Convert driver to using the GPIO-lib-based IRQ-chip
On Thu, Jul 23, 2020 at 05:08:15PM +0300, Andy Shevchenko wrote: > On Thu, Jul 23, 2020 at 04:38:55AM +0300, Serge Semin wrote: > > GPIO-lib provides a ready-to-use interface to initialize an IRQ-chip on > > top of a GPIO chip. It's better from maintainability and readability > > point of view to use one instead of supporting a hand-written Generic > > IRQ-chip-based implementation. Moreover the new implementation won't > > cause much functional overhead but will provide a cleaner driver code. > > All of that makes the DW APB GPIO driver conversion pretty much justified > > especially seeing a tendency of the other GPIO drivers getting converted > > too. > > > > Here is what we do in the framework of this commit to convert the driver > > to using the GPIO-lib-based IRQ-chip interface: > > 1) IRQ ack, mask and unmask callbacks are locally defined instead of > > using the Generic IRQ-chip ones. > > 2) An irq_chip structure instance is embedded into the dwapb_gpio > > private data. Note we can't have a static instance of that structure since > > GPIO-lib will add some hooks into it by calling gpiochip_set_irq_hooks(). > > A warning about that would have been printed by the GPIO-lib code if we > > used a single irq_chip structure instance for multiple DW APB GPIO > > controllers. > > 3) Initialize the gpio_irq_chip structure embedded into the gpio_chip > > descriptor. By default there is no IRQ enabled so any event raised will be > > handled by the handle_bad_irq() IRQ flow handler. If DW APB GPIO IP-core > > is synthesized to have non-shared reference IRQ-lines, then as before the > > hierarchical and cascaded cases are distinguished by checking how many > > parental IRQs are defined. (Note irq_set_chained_handler_and_data() won't > > initialize IRQs, which descriptors couldn't be found.) If DW APB GPIO IP > > is used on a platform with shared IRQ line, then we simply won't let the > > GPIO-lib to initialize the parental IRQs, but will handle them locally in > > the driver. > > 4) Discard linear IRQ-domain and Generic IRQ-chip initialization, since > > GPIO-lib IRQ-chip interface will create a new domain and accept a standard > > IRQ-chip structure pointer based on the setting we provided in the > > gpio_irq_chip structure. > > 5) Manually select a proper IRQ flow handler directly in the > > irq_set_type() callback by calling irq_set_handler_locked() method, since > > an ordinary (not Generic) irq_chip descriptor is now utilized. > > 6) Discard the custom GPIO-to-IRQ mapping function since GPIO-lib defines > > the standard method gpiochip_to_irq(), which will be used anyway no matter > > whether the custom to_irq callback is specified or not. > > 7) Discard the acpi_gpiochip_{request,free}_interrupts() > > invocations, since they will be called from > > gpiochip_add_irqchip()/gpiochip_irqchip_remove() anyway. > > 8) Alter CONFIG_GPIO_DWAPB kernel config to select > > CONFIG_GPIOLIB_IRQCHIP instead of CONFIG_GENERIC_IRQ_CHIP. > > > ... > > One more thing... > > > static u32 dwapb_do_irq(struct dwapb_gpio *gpio) > > { > > + struct gpio_chip *gc = >ports[0].gc; > > unsigned long irq_status; > > irq_hw_number_t hwirq; > > > > irq_status = dwapb_read(gpio, GPIO_INTSTATUS); > > for_each_set_bit(hwirq, _status, 32) { > > - int gpio_irq = irq_find_mapping(gpio->domain, hwirq); > > + int gpio_irq = gc->to_irq(gc, hwirq); > > Very, very few do this. > Can we stick with the original one? > (See plenty of other examples in the GPIO / pin control subsystems. You are right. After more thorough studying the IRQ-domain code I've found out that irq_domain_add_simple() provides the on-the-fly mapping creation. We don't have to call irq_create_mapping() first before using irq_find_mapping(). So the irq_find_mapping(gc->irq.domain, hwirq) method can be freely called here the same way as the most of the GPIO drivers do. Thanks for noticing this. -Sergey > > > u32 irq_type = irq_get_trigger_type(gpio_irq); > > > > generic_handle_irq(gpio_irq); > > > > } > > -- > With Best Regards, > Andy Shevchenko > >