Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-12 Thread Boris Brezillon
On Wed, 11 Apr 2018 11:12:24 +0200
Ladislav Michl  wrote:

> On Wed, Apr 11, 2018 at 10:52:01AM +0200, Boris Brezillon wrote:
> > On Wed, 11 Apr 2018 10:27:46 +0200
> > Ladislav Michl  wrote:
> >   
> > > On Wed, Apr 11, 2018 at 10:08:06AM +0200, Boris Brezillon wrote:  
> > > > On Wed, 11 Apr 2018 09:36:56 +0200
> > > > Ladislav Michl  wrote:
> > > > 
> > > > > Hi Boris,
> > > > > 
> > > > > On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:
> > > [...]  
> > > > > > Not sure this approach is safe on all archs: if the cache is VIVT or
> > > > > > VIPT, you may have several entries pointing to the same phys page, 
> > > > > > and
> > > > > > then, when dma_map_page() does its cache maintenance operations, 
> > > > > > it's
> > > > > > only taking one of these entries into account.  
> > > > > 
> > > > > Hmm, I used the same approach Samsung OneNAND driver does since commit
> > > > > dcf08227e964a53a2cb39130b74842c7dcb6adde.
> > > > > Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
> > > > > is VIPT. In that case samsung's driver code has the same problem.
> > > > > 
> > > > > > In other parts of the MTD subsystem, we tend to not do DMA on 
> > > > > > buffers
> > > > > > that have been vmalloc-ed.
> > > > > > 
> > > > > > You can do something like
> > > > > > 
> > > > > > if (virt_addr_valid(buf))
> > > > > > /* Use DMA */
> > > > > > else
> > > > > > /*
> > > > > >  * Do not use DMA, or use a bounce buffer
> > > > > >  * allocated with kmalloc
> > > > > >  */  
> > > > > 
> > > > > Okay, I'll use this approach then, but first I'd like to be sure 
> > > > > above is
> > > > > correct. Anyone?
> > > > 
> > > > See this discussion [1]. The problem came up a few times already, so
> > > > might find other threads describing why it's not safe.
> > > > 
> > > > [1]https://lists.linuxfoundation.org/pipermail/iommu/2016-March/016240.html
> > > > 
> > > 
> > > Question was more likely whenever there might exist more that one mapping
> > > of the same page.  
> > 
> > I'm clearly not an expert, so I might be wrong, but I think a physical
> > page can be both in the identity mapping and mapped in the vmalloc
> > space. Now, is there a real risk that both ends are accessed in
> > parallel thus making different cache entries pointing to the same phys
> > page dirty, I'm not sure. Or maybe there are other corner case, but
> > you'll have to ask Russell (or any other ARM expert) to get a clearer
> > answer.  
> 
> Thank you anyway. As DMA was disabled completely for all DT enabled boards
> until 4.16 let's play safe and disable it for HIGHMEM case as you suggested.
> Later, we might eventually use the same {read,write}_bufferram for both
> OMAP and S5PC110.
> 
> > > But okay, I'll disable DMA for highmem. Patch will follow.
> > > 
> > >   ladis  
> 
> Andreas, Nikolaus, could you please test patch bellow? It is against
> linux.git and should apply also against 4.16 once you modify path.
> 

Looks like it fixes the problem. Could you please rebase on top of
v4.17-rc1 when it's out and send a new version? Note that onenand code
has moved to drivers/mtd/nand/onenand, so we'll have to manually
backport the fix to 4.16.

Regards,

Boris

> Thank you,
>   ladis
> 
> --- >8 ---  
> 
> From: Ladislav Michl 
> Subject: [PATCH] mtd: onenand: omap2: Disable DMA for HIGHMEM buffers
> 
> dma_map_single doesn't get the proper DMA address for vmalloced area,
> so disable DMA in this case.
> 
> Signed-off-by: Ladislav Michl 
> Reported-by: "H. Nikolaus Schaller" 
> ---
>  drivers/mtd/nand/onenand/omap2.c | 105 +++
>  1 file changed, 38 insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/mtd/nand/onenand/omap2.c 
> b/drivers/mtd/nand/onenand/omap2.c
> index 9c159f0dd9a6..321137158ff3 100644
> --- a/drivers/mtd/nand/onenand/omap2.c
> +++ b/drivers/mtd/nand/onenand/omap2.c
> @@ -375,56 +375,42 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
> *mtd, int area,
>  {
>   struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
>   struct onenand_chip *this = mtd->priv;
> - dma_addr_t dma_src, dma_dst;
> - int bram_offset;
> + struct device *dev = >pdev->dev;
>   void *buf = (void *)buffer;
> + dma_addr_t dma_src, dma_dst;
> + int bram_offset, err;
>   size_t xtra;
> - int ret;
>  
>   bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
> - if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
> - goto out_copy;
> -
> - /* panic_write() may be in an interrupt context */
> - if (in_interrupt() || oops_in_progress)
> + /*
> +  * If the buffer address is not DMA-able, len is not 

Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread H. Nikolaus Schaller

> Am 11.04.2018 um 11:12 schrieb Ladislav Michl :
> 
> On Wed, Apr 11, 2018 at 10:52:01AM +0200, Boris Brezillon wrote:
>> On Wed, 11 Apr 2018 10:27:46 +0200
>> Ladislav Michl  wrote:
>> 
>>> On Wed, Apr 11, 2018 at 10:08:06AM +0200, Boris Brezillon wrote:
 On Wed, 11 Apr 2018 09:36:56 +0200
 Ladislav Michl  wrote:
 
> Hi Boris,
> 
> On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:  
>>> [...]
>> Not sure this approach is safe on all archs: if the cache is VIVT or
>> VIPT, you may have several entries pointing to the same phys page, and
>> then, when dma_map_page() does its cache maintenance operations, it's
>> only taking one of these entries into account.
> 
> Hmm, I used the same approach Samsung OneNAND driver does since commit
> dcf08227e964a53a2cb39130b74842c7dcb6adde.
> Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
> is VIPT. In that case samsung's driver code has the same problem.
> 
>> In other parts of the MTD subsystem, we tend to not do DMA on buffers
>> that have been vmalloc-ed.
>> 
>> You can do something like
>> 
>>  if (virt_addr_valid(buf))
>>  /* Use DMA */
>>  else
>>  /*
>>   * Do not use DMA, or use a bounce buffer
>>   * allocated with kmalloc
>>   */
> 
> Okay, I'll use this approach then, but first I'd like to be sure above is
> correct. Anyone?  
 
 See this discussion [1]. The problem came up a few times already, so
 might find other threads describing why it's not safe.
 
 [1]https://lists.linuxfoundation.org/pipermail/iommu/2016-March/016240.html
   
>>> 
>>> Question was more likely whenever there might exist more that one mapping
>>> of the same page.
>> 
>> I'm clearly not an expert, so I might be wrong, but I think a physical
>> page can be both in the identity mapping and mapped in the vmalloc
>> space. Now, is there a real risk that both ends are accessed in
>> parallel thus making different cache entries pointing to the same phys
>> page dirty, I'm not sure. Or maybe there are other corner case, but
>> you'll have to ask Russell (or any other ARM expert) to get a clearer
>> answer.
> 
> Thank you anyway. As DMA was disabled completely for all DT enabled boards
> until 4.16 let's play safe and disable it for HIGHMEM case as you suggested.
> Later, we might eventually use the same {read,write}_bufferram for both
> OMAP and S5PC110.
> 
>>> But okay, I'll disable DMA for highmem. Patch will follow.
>>> 
>>> ladis
> 
> Andreas, Nikolaus, could you please test patch bellow? It is against
> linux.git and should apply also against 4.16 once you modify path.

Works with v4.16!

I have done "tar cf /dev/null /mnt/nand" and have only
got some fixable bit flips. Writing a test file to OneNAND
also worked and doing a sync did move the weak block to a new PEB.

So I think read & write works again.

> 
> Thank you,
>   ladis

BR and thanks,
Nikolaus

> 
> --- >8 ---
> 
> From: Ladislav Michl 
> Subject: [PATCH] mtd: onenand: omap2: Disable DMA for HIGHMEM buffers
> 
> dma_map_single doesn't get the proper DMA address for vmalloced area,
> so disable DMA in this case.
> 
> Signed-off-by: Ladislav Michl 
> Reported-by: "H. Nikolaus Schaller" 
> ---
> drivers/mtd/nand/onenand/omap2.c | 105 +++
> 1 file changed, 38 insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/mtd/nand/onenand/omap2.c 
> b/drivers/mtd/nand/onenand/omap2.c
> index 9c159f0dd9a6..321137158ff3 100644
> --- a/drivers/mtd/nand/onenand/omap2.c
> +++ b/drivers/mtd/nand/onenand/omap2.c
> @@ -375,56 +375,42 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
> *mtd, int area,
> {
>   struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
>   struct onenand_chip *this = mtd->priv;
> - dma_addr_t dma_src, dma_dst;
> - int bram_offset;
> + struct device *dev = >pdev->dev;
>   void *buf = (void *)buffer;
> + dma_addr_t dma_src, dma_dst;
> + int bram_offset, err;
>   size_t xtra;
> - int ret;
> 
>   bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
> - if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
> - goto out_copy;
> -
> - /* panic_write() may be in an interrupt context */
> - if (in_interrupt() || oops_in_progress)
> + /*
> +  * If the buffer address is not DMA-able, len is not long enough to make
> +  * DMA transfers profitable or panic_write() may be in an interrupt
> +  * context fallback to PIO mode.
> +  */
> + if (!virt_addr_valid(buf) || bram_offset & 3 || (size_t)buf & 3 ||
> + count < 384 || in_interrupt() || 

Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Ladislav Michl
On Wed, Apr 11, 2018 at 10:52:01AM +0200, Boris Brezillon wrote:
> On Wed, 11 Apr 2018 10:27:46 +0200
> Ladislav Michl  wrote:
> 
> > On Wed, Apr 11, 2018 at 10:08:06AM +0200, Boris Brezillon wrote:
> > > On Wed, 11 Apr 2018 09:36:56 +0200
> > > Ladislav Michl  wrote:
> > >   
> > > > Hi Boris,
> > > > 
> > > > On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:  
> > [...]
> > > > > Not sure this approach is safe on all archs: if the cache is VIVT or
> > > > > VIPT, you may have several entries pointing to the same phys page, and
> > > > > then, when dma_map_page() does its cache maintenance operations, it's
> > > > > only taking one of these entries into account.
> > > > 
> > > > Hmm, I used the same approach Samsung OneNAND driver does since commit
> > > > dcf08227e964a53a2cb39130b74842c7dcb6adde.
> > > > Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
> > > > is VIPT. In that case samsung's driver code has the same problem.
> > > >   
> > > > > In other parts of the MTD subsystem, we tend to not do DMA on buffers
> > > > > that have been vmalloc-ed.
> > > > > 
> > > > > You can do something like
> > > > > 
> > > > >   if (virt_addr_valid(buf))
> > > > >   /* Use DMA */
> > > > >   else
> > > > >   /*
> > > > >* Do not use DMA, or use a bounce buffer
> > > > >* allocated with kmalloc
> > > > >*/
> > > > 
> > > > Okay, I'll use this approach then, but first I'd like to be sure above 
> > > > is
> > > > correct. Anyone?  
> > > 
> > > See this discussion [1]. The problem came up a few times already, so
> > > might find other threads describing why it's not safe.
> > > 
> > > [1]https://lists.linuxfoundation.org/pipermail/iommu/2016-March/016240.html
> > >   
> > 
> > Question was more likely whenever there might exist more that one mapping
> > of the same page.
> 
> I'm clearly not an expert, so I might be wrong, but I think a physical
> page can be both in the identity mapping and mapped in the vmalloc
> space. Now, is there a real risk that both ends are accessed in
> parallel thus making different cache entries pointing to the same phys
> page dirty, I'm not sure. Or maybe there are other corner case, but
> you'll have to ask Russell (or any other ARM expert) to get a clearer
> answer.

Thank you anyway. As DMA was disabled completely for all DT enabled boards
until 4.16 let's play safe and disable it for HIGHMEM case as you suggested.
Later, we might eventually use the same {read,write}_bufferram for both
OMAP and S5PC110.

> > But okay, I'll disable DMA for highmem. Patch will follow.
> > 
> > ladis

Andreas, Nikolaus, could you please test patch bellow? It is against
linux.git and should apply also against 4.16 once you modify path.

Thank you,
ladis

--- >8 ---

From: Ladislav Michl 
Subject: [PATCH] mtd: onenand: omap2: Disable DMA for HIGHMEM buffers

dma_map_single doesn't get the proper DMA address for vmalloced area,
so disable DMA in this case.

Signed-off-by: Ladislav Michl 
Reported-by: "H. Nikolaus Schaller" 
---
 drivers/mtd/nand/onenand/omap2.c | 105 +++
 1 file changed, 38 insertions(+), 67 deletions(-)

diff --git a/drivers/mtd/nand/onenand/omap2.c b/drivers/mtd/nand/onenand/omap2.c
index 9c159f0dd9a6..321137158ff3 100644
--- a/drivers/mtd/nand/onenand/omap2.c
+++ b/drivers/mtd/nand/onenand/omap2.c
@@ -375,56 +375,42 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
*mtd, int area,
 {
struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
struct onenand_chip *this = mtd->priv;
-   dma_addr_t dma_src, dma_dst;
-   int bram_offset;
+   struct device *dev = >pdev->dev;
void *buf = (void *)buffer;
+   dma_addr_t dma_src, dma_dst;
+   int bram_offset, err;
size_t xtra;
-   int ret;
 
bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
-   if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
-   goto out_copy;
-
-   /* panic_write() may be in an interrupt context */
-   if (in_interrupt() || oops_in_progress)
+   /*
+* If the buffer address is not DMA-able, len is not long enough to make
+* DMA transfers profitable or panic_write() may be in an interrupt
+* context fallback to PIO mode.
+*/
+   if (!virt_addr_valid(buf) || bram_offset & 3 || (size_t)buf & 3 ||
+   count < 384 || in_interrupt() || oops_in_progress )
goto out_copy;
 
-   if (buf >= high_memory) {
-   struct page *p1;
-
-   if (((size_t)buf & PAGE_MASK) !=
-   ((size_t)(buf + count - 1) & PAGE_MASK))
-   goto out_copy;
-   p1 = 

Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Boris Brezillon
On Wed, 11 Apr 2018 10:27:46 +0200
Ladislav Michl  wrote:

> On Wed, Apr 11, 2018 at 10:08:06AM +0200, Boris Brezillon wrote:
> > On Wed, 11 Apr 2018 09:36:56 +0200
> > Ladislav Michl  wrote:
> >   
> > > Hi Boris,
> > > 
> > > On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:  
> [...]
> > > > Not sure this approach is safe on all archs: if the cache is VIVT or
> > > > VIPT, you may have several entries pointing to the same phys page, and
> > > > then, when dma_map_page() does its cache maintenance operations, it's
> > > > only taking one of these entries into account.
> > > 
> > > Hmm, I used the same approach Samsung OneNAND driver does since commit
> > > dcf08227e964a53a2cb39130b74842c7dcb6adde.
> > > Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
> > > is VIPT. In that case samsung's driver code has the same problem.
> > >   
> > > > In other parts of the MTD subsystem, we tend to not do DMA on buffers
> > > > that have been vmalloc-ed.
> > > > 
> > > > You can do something like
> > > > 
> > > > if (virt_addr_valid(buf))
> > > > /* Use DMA */
> > > > else
> > > > /*
> > > >  * Do not use DMA, or use a bounce buffer
> > > >  * allocated with kmalloc
> > > >  */
> > > 
> > > Okay, I'll use this approach then, but first I'd like to be sure above is
> > > correct. Anyone?  
> > 
> > See this discussion [1]. The problem came up a few times already, so
> > might find other threads describing why it's not safe.
> > 
> > [1]https://lists.linuxfoundation.org/pipermail/iommu/2016-March/016240.html 
> >  
> 
> Question was more likely whenever there might exist more that one mapping
> of the same page.

I'm clearly not an expert, so I might be wrong, but I think a physical
page can be both in the identity mapping and mapped in the vmalloc
space. Now, is there a real risk that both ends are accessed in
parallel thus making different cache entries pointing to the same phys
page dirty, I'm not sure. Or maybe there are other corner case, but
you'll have to ask Russell (or any other ARM expert) to get a clearer
answer.

> But okay, I'll disable DMA for highmem. Patch will follow.
> 
>   ladis



Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Ladislav Michl
On Wed, Apr 11, 2018 at 10:08:06AM +0200, Boris Brezillon wrote:
> On Wed, 11 Apr 2018 09:36:56 +0200
> Ladislav Michl  wrote:
> 
> > Hi Boris,
> > 
> > On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:
[...]
> > > Not sure this approach is safe on all archs: if the cache is VIVT or
> > > VIPT, you may have several entries pointing to the same phys page, and
> > > then, when dma_map_page() does its cache maintenance operations, it's
> > > only taking one of these entries into account.  
> > 
> > Hmm, I used the same approach Samsung OneNAND driver does since commit
> > dcf08227e964a53a2cb39130b74842c7dcb6adde.
> > Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
> > is VIPT. In that case samsung's driver code has the same problem.
> > 
> > > In other parts of the MTD subsystem, we tend to not do DMA on buffers
> > > that have been vmalloc-ed.
> > > 
> > > You can do something like
> > > 
> > >   if (virt_addr_valid(buf))
> > >   /* Use DMA */
> > >   else
> > >   /*
> > >* Do not use DMA, or use a bounce buffer
> > >* allocated with kmalloc
> > >*/  
> > 
> > Okay, I'll use this approach then, but first I'd like to be sure above is
> > correct. Anyone?
> 
> See this discussion [1]. The problem came up a few times already, so
> might find other threads describing why it's not safe.
> 
> [1]https://lists.linuxfoundation.org/pipermail/iommu/2016-March/016240.html

Question was more likely whenever there might exist more that one mapping
of the same page. But okay, I'll disable DMA for highmem. Patch will follow.

ladis


Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Boris Brezillon
On Wed, 11 Apr 2018 09:36:56 +0200
Ladislav Michl  wrote:

> Hi Boris,
> 
> On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:
> > Hi Ladislav,
> > 
> > On Wed, 11 Apr 2018 08:26:07 +0200
> > Ladislav Michl  wrote:
> >   
> > > Hi Andreas,
> > > 
> > > On Wed, Apr 11, 2018 at 06:59:03AM +0200, Andreas Kemnade wrote:  
> > > > Hi Ladis,
> > > > 
> > > > On Tue, 10 Apr 2018 22:56:43 +0200
> > > > Ladislav Michl  wrote:
> > > > 
> > > > > Hi Nikolaus,
> > > > > 
> > > > > On Tue, Apr 10, 2018 at 06:25:17PM +0200, H. Nikolaus Schaller wrote: 
> > > > >
> > > > > > Hi,
> > > > > > we just started testing the v4.16 kernel and found the
> > > > > > device no longer bootable (works with v4.15). It turned
> > > > > > out that there was a harmful modification somewhere between
> > > > > > v4.15.0 and v4.16-rc1.
> > > > > > 
> > > > > > A git bisect points to this patch:  
> > > > > 
> > > > > Well, that's a shame... However, this code is in production for 
> > > > > several
> > > > > months now, so could you, please put 'goto out_copy' if 'buf >= 
> > > > > high_memory'
> > > > > condition is met, ie:
> > > > > --- a/drivers/mtd/nand/onenand/omap2.c
> > > > > +++ b/drivers/mtd/nand/onenand/omap2.c
> > > > > @@ -392,6 +392,7 @@ static int omap2_onenand_read_bufferram(struct 
> > > > > mtd_info *mtd, int area,
> > > > >   if (buf >= high_memory) {
> > > > >   struct page *p1;
> > > > >  
> > > > > + goto out_copy;
> > > > >   if (((size_t)buf & PAGE_MASK) !=
> > > > >   ((size_t)(buf + count - 1) & PAGE_MASK))
> > > > >   goto out_copy;
> > > > 
> > > > I had the same problem here, and that snippet  helps here. ubiattach
> > > > -p /dev/mtdX does not cause kernel oopses here anymore
> > > 
> > > It seems reviving old code always comes at a price :-) Could you try
> > > following patch, so far compile tested only?
> > > (we'll need to do the same for omap2_onenand_write_bufferram, but
> > > it sould be enough for testing purposes now)
> > > 
> > > diff --git a/drivers/mtd/nand/onenand/omap2.c 
> > > b/drivers/mtd/nand/onenand/omap2.c
> > > index 9c159f0dd9a6..04cefd7a6487 100644
> > > --- a/drivers/mtd/nand/onenand/omap2.c
> > > +++ b/drivers/mtd/nand/onenand/omap2.c
> > > @@ -375,11 +375,12 @@ static int omap2_onenand_read_bufferram(struct 
> > > mtd_info *mtd, int area,
> > >  {
> > >   struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
> > >   struct onenand_chip *this = mtd->priv;
> > > + struct device *dev = >pdev->dev;
> > >   dma_addr_t dma_src, dma_dst;
> > >   int bram_offset;
> > >   void *buf = (void *)buffer;
> > >   size_t xtra;
> > > - int ret;
> > > + int ret, page_dma = 0;
> > >  
> > >   bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
> > >   if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
> > > @@ -389,38 +390,43 @@ static int omap2_onenand_read_bufferram(struct 
> > > mtd_info *mtd, int area,
> > >   if (in_interrupt() || oops_in_progress)
> > >   goto out_copy;
> > >  
> > > + xtra = count & 3;
> > > + if (xtra) {
> > > + count -= xtra;
> > > + memcpy(buf + count, this->base + bram_offset + count, xtra);
> > > + }
> > > +
> > > + /* Handle vmalloc address */
> > >   if (buf >= high_memory) {
> > > - struct page *p1;
> > > + struct page *page;
> > >  
> > >   if (((size_t)buf & PAGE_MASK) !=
> > >   ((size_t)(buf + count - 1) & PAGE_MASK))
> > >   goto out_copy;
> > > - p1 = vmalloc_to_page(buf);
> > > - if (!p1)
> > > + page = vmalloc_to_page(buf);  
> > 
> > Not sure this approach is safe on all archs: if the cache is VIVT or
> > VIPT, you may have several entries pointing to the same phys page, and
> > then, when dma_map_page() does its cache maintenance operations, it's
> > only taking one of these entries into account.  
> 
> Hmm, I used the same approach Samsung OneNAND driver does since commit
> dcf08227e964a53a2cb39130b74842c7dcb6adde.
> Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
> is VIPT. In that case samsung's driver code has the same problem.
> 
> > In other parts of the MTD subsystem, we tend to not do DMA on buffers
> > that have been vmalloc-ed.
> > 
> > You can do something like
> > 
> > if (virt_addr_valid(buf))
> > /* Use DMA */
> > else
> > /*
> >  * Do not use DMA, or use a bounce buffer
> >  * allocated with kmalloc
> >  */  
> 
> Okay, I'll use this approach then, but first I'd like to be sure above is
> correct. Anyone?

See this discussion [1]. The problem came up a few times already, so
might find other threads describing why it's not safe.


Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Ladislav Michl
Hi Boris,

On Wed, Apr 11, 2018 at 09:15:28AM +0200, Boris Brezillon wrote:
> Hi Ladislav,
> 
> On Wed, 11 Apr 2018 08:26:07 +0200
> Ladislav Michl  wrote:
> 
> > Hi Andreas,
> > 
> > On Wed, Apr 11, 2018 at 06:59:03AM +0200, Andreas Kemnade wrote:
> > > Hi Ladis,
> > > 
> > > On Tue, 10 Apr 2018 22:56:43 +0200
> > > Ladislav Michl  wrote:
> > >   
> > > > Hi Nikolaus,
> > > > 
> > > > On Tue, Apr 10, 2018 at 06:25:17PM +0200, H. Nikolaus Schaller wrote:  
> > > > > Hi,
> > > > > we just started testing the v4.16 kernel and found the
> > > > > device no longer bootable (works with v4.15). It turned
> > > > > out that there was a harmful modification somewhere between
> > > > > v4.15.0 and v4.16-rc1.
> > > > > 
> > > > > A git bisect points to this patch:
> > > > 
> > > > Well, that's a shame... However, this code is in production for several
> > > > months now, so could you, please put 'goto out_copy' if 'buf >= 
> > > > high_memory'
> > > > condition is met, ie:
> > > > --- a/drivers/mtd/nand/onenand/omap2.c
> > > > +++ b/drivers/mtd/nand/onenand/omap2.c
> > > > @@ -392,6 +392,7 @@ static int omap2_onenand_read_bufferram(struct 
> > > > mtd_info *mtd, int area,
> > > > if (buf >= high_memory) {
> > > > struct page *p1;
> > > >  
> > > > +   goto out_copy;
> > > > if (((size_t)buf & PAGE_MASK) !=
> > > > ((size_t)(buf + count - 1) & PAGE_MASK))
> > > > goto out_copy;  
> > > 
> > > I had the same problem here, and that snippet  helps here. ubiattach
> > > -p /dev/mtdX does not cause kernel oopses here anymore  
> > 
> > It seems reviving old code always comes at a price :-) Could you try
> > following patch, so far compile tested only?
> > (we'll need to do the same for omap2_onenand_write_bufferram, but
> > it sould be enough for testing purposes now)
> > 
> > diff --git a/drivers/mtd/nand/onenand/omap2.c 
> > b/drivers/mtd/nand/onenand/omap2.c
> > index 9c159f0dd9a6..04cefd7a6487 100644
> > --- a/drivers/mtd/nand/onenand/omap2.c
> > +++ b/drivers/mtd/nand/onenand/omap2.c
> > @@ -375,11 +375,12 @@ static int omap2_onenand_read_bufferram(struct 
> > mtd_info *mtd, int area,
> >  {
> > struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
> > struct onenand_chip *this = mtd->priv;
> > +   struct device *dev = >pdev->dev;
> > dma_addr_t dma_src, dma_dst;
> > int bram_offset;
> > void *buf = (void *)buffer;
> > size_t xtra;
> > -   int ret;
> > +   int ret, page_dma = 0;
> >  
> > bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
> > if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
> > @@ -389,38 +390,43 @@ static int omap2_onenand_read_bufferram(struct 
> > mtd_info *mtd, int area,
> > if (in_interrupt() || oops_in_progress)
> > goto out_copy;
> >  
> > +   xtra = count & 3;
> > +   if (xtra) {
> > +   count -= xtra;
> > +   memcpy(buf + count, this->base + bram_offset + count, xtra);
> > +   }
> > +
> > +   /* Handle vmalloc address */
> > if (buf >= high_memory) {
> > -   struct page *p1;
> > +   struct page *page;
> >  
> > if (((size_t)buf & PAGE_MASK) !=
> > ((size_t)(buf + count - 1) & PAGE_MASK))
> > goto out_copy;
> > -   p1 = vmalloc_to_page(buf);
> > -   if (!p1)
> > +   page = vmalloc_to_page(buf);
> 
> Not sure this approach is safe on all archs: if the cache is VIVT or
> VIPT, you may have several entries pointing to the same phys page, and
> then, when dma_map_page() does its cache maintenance operations, it's
> only taking one of these entries into account.

Hmm, I used the same approach Samsung OneNAND driver does since commit
dcf08227e964a53a2cb39130b74842c7dcb6adde.
Both TI OMAP3630 and Samsung S5PC110 are using Cortex-A8 which
is VIPT. In that case samsung's driver code has the same problem.

> In other parts of the MTD subsystem, we tend to not do DMA on buffers
> that have been vmalloc-ed.
> 
> You can do something like
> 
>   if (virt_addr_valid(buf))
>   /* Use DMA */
>   else
>   /*
>* Do not use DMA, or use a bounce buffer
>* allocated with kmalloc
>*/

Okay, I'll use this approach then, but first I'd like to be sure above is
correct. Anyone?

> Regards,
> 
> Boris

Thank you,
ladis

> > +   if (!page)
> > goto out_copy;
> > -   buf = page_address(p1) + ((size_t)buf & ~PAGE_MASK);
> > -   }
> > -
> > -   xtra = count & 3;
> > -   if (xtra) {
> > -   count -= xtra;
> > -   memcpy(buf + count, this->base + bram_offset + count, xtra);
> > +   page_dma = 1;
> > +   dma_dst = dma_map_page(dev, page, (size_t) buf & ~PAGE_MASK,
> > + 

Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Boris Brezillon
Hi Ladislav,

On Wed, 11 Apr 2018 08:26:07 +0200
Ladislav Michl  wrote:

> Hi Andreas,
> 
> On Wed, Apr 11, 2018 at 06:59:03AM +0200, Andreas Kemnade wrote:
> > Hi Ladis,
> > 
> > On Tue, 10 Apr 2018 22:56:43 +0200
> > Ladislav Michl  wrote:
> >   
> > > Hi Nikolaus,
> > > 
> > > On Tue, Apr 10, 2018 at 06:25:17PM +0200, H. Nikolaus Schaller wrote:  
> > > > Hi,
> > > > we just started testing the v4.16 kernel and found the
> > > > device no longer bootable (works with v4.15). It turned
> > > > out that there was a harmful modification somewhere between
> > > > v4.15.0 and v4.16-rc1.
> > > > 
> > > > A git bisect points to this patch:
> > > 
> > > Well, that's a shame... However, this code is in production for several
> > > months now, so could you, please put 'goto out_copy' if 'buf >= 
> > > high_memory'
> > > condition is met, ie:
> > > --- a/drivers/mtd/nand/onenand/omap2.c
> > > +++ b/drivers/mtd/nand/onenand/omap2.c
> > > @@ -392,6 +392,7 @@ static int omap2_onenand_read_bufferram(struct 
> > > mtd_info *mtd, int area,
> > >   if (buf >= high_memory) {
> > >   struct page *p1;
> > >  
> > > + goto out_copy;
> > >   if (((size_t)buf & PAGE_MASK) !=
> > >   ((size_t)(buf + count - 1) & PAGE_MASK))
> > >   goto out_copy;  
> > 
> > I had the same problem here, and that snippet  helps here. ubiattach
> > -p /dev/mtdX does not cause kernel oopses here anymore  
> 
> It seems reviving old code always comes at a price :-) Could you try
> following patch, so far compile tested only?
> (we'll need to do the same for omap2_onenand_write_bufferram, but
> it sould be enough for testing purposes now)
> 
> diff --git a/drivers/mtd/nand/onenand/omap2.c 
> b/drivers/mtd/nand/onenand/omap2.c
> index 9c159f0dd9a6..04cefd7a6487 100644
> --- a/drivers/mtd/nand/onenand/omap2.c
> +++ b/drivers/mtd/nand/onenand/omap2.c
> @@ -375,11 +375,12 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
> *mtd, int area,
>  {
>   struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
>   struct onenand_chip *this = mtd->priv;
> + struct device *dev = >pdev->dev;
>   dma_addr_t dma_src, dma_dst;
>   int bram_offset;
>   void *buf = (void *)buffer;
>   size_t xtra;
> - int ret;
> + int ret, page_dma = 0;
>  
>   bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
>   if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
> @@ -389,38 +390,43 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
> *mtd, int area,
>   if (in_interrupt() || oops_in_progress)
>   goto out_copy;
>  
> + xtra = count & 3;
> + if (xtra) {
> + count -= xtra;
> + memcpy(buf + count, this->base + bram_offset + count, xtra);
> + }
> +
> + /* Handle vmalloc address */
>   if (buf >= high_memory) {
> - struct page *p1;
> + struct page *page;
>  
>   if (((size_t)buf & PAGE_MASK) !=
>   ((size_t)(buf + count - 1) & PAGE_MASK))
>   goto out_copy;
> - p1 = vmalloc_to_page(buf);
> - if (!p1)
> + page = vmalloc_to_page(buf);

Not sure this approach is safe on all archs: if the cache is VIVT or
VIPT, you may have several entries pointing to the same phys page, and
then, when dma_map_page() does its cache maintenance operations, it's
only taking one of these entries into account.

In other parts of the MTD subsystem, we tend to not do DMA on buffers
that have been vmalloc-ed.

You can do something like

if (virt_addr_valid(buf))
/* Use DMA */
else
/*
 * Do not use DMA, or use a bounce buffer
 * allocated with kmalloc
 */

Regards,

Boris

> + if (!page)
>   goto out_copy;
> - buf = page_address(p1) + ((size_t)buf & ~PAGE_MASK);
> - }
> -
> - xtra = count & 3;
> - if (xtra) {
> - count -= xtra;
> - memcpy(buf + count, this->base + bram_offset + count, xtra);
> + page_dma = 1;
> + dma_dst = dma_map_page(dev, page, (size_t) buf & ~PAGE_MASK,
> +count, DMA_FROM_DEVICE);
> + } else {
> + dma_dst = dma_map_single(dev, buf, count, DMA_FROM_DEVICE);
>   }
> -
>   dma_src = c->phys_base + bram_offset;
> - dma_dst = dma_map_single(>pdev->dev, buf, count, DMA_FROM_DEVICE);
> - if (dma_mapping_error(>pdev->dev, dma_dst)) {
> - dev_err(>pdev->dev,
> - "Couldn't DMA map a %d byte buffer\n",
> - count);
> +
> + if (dma_mapping_error(dev, dma_dst)) {
> + dev_err(dev, "Couldn't DMA map a %d byte buffer\n", count);
>   goto out_copy;
>   

Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-11 Thread Ladislav Michl
Hi Andreas,

On Wed, Apr 11, 2018 at 06:59:03AM +0200, Andreas Kemnade wrote:
> Hi Ladis,
> 
> On Tue, 10 Apr 2018 22:56:43 +0200
> Ladislav Michl  wrote:
> 
> > Hi Nikolaus,
> > 
> > On Tue, Apr 10, 2018 at 06:25:17PM +0200, H. Nikolaus Schaller wrote:
> > > Hi,
> > > we just started testing the v4.16 kernel and found the
> > > device no longer bootable (works with v4.15). It turned
> > > out that there was a harmful modification somewhere between
> > > v4.15.0 and v4.16-rc1.
> > > 
> > > A git bisect points to this patch:  
> > 
> > Well, that's a shame... However, this code is in production for several
> > months now, so could you, please put 'goto out_copy' if 'buf >= high_memory'
> > condition is met, ie:
> > --- a/drivers/mtd/nand/onenand/omap2.c
> > +++ b/drivers/mtd/nand/onenand/omap2.c
> > @@ -392,6 +392,7 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
> > *mtd, int area,
> > if (buf >= high_memory) {
> > struct page *p1;
> >  
> > +   goto out_copy;
> > if (((size_t)buf & PAGE_MASK) !=
> > ((size_t)(buf + count - 1) & PAGE_MASK))
> > goto out_copy;
> 
> I had the same problem here, and that snippet  helps here. ubiattach
> -p /dev/mtdX does not cause kernel oopses here anymore

It seems reviving old code always comes at a price :-) Could you try
following patch, so far compile tested only?
(we'll need to do the same for omap2_onenand_write_bufferram, but
it sould be enough for testing purposes now)

diff --git a/drivers/mtd/nand/onenand/omap2.c b/drivers/mtd/nand/onenand/omap2.c
index 9c159f0dd9a6..04cefd7a6487 100644
--- a/drivers/mtd/nand/onenand/omap2.c
+++ b/drivers/mtd/nand/onenand/omap2.c
@@ -375,11 +375,12 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
*mtd, int area,
 {
struct omap2_onenand *c = container_of(mtd, struct omap2_onenand, mtd);
struct onenand_chip *this = mtd->priv;
+   struct device *dev = >pdev->dev;
dma_addr_t dma_src, dma_dst;
int bram_offset;
void *buf = (void *)buffer;
size_t xtra;
-   int ret;
+   int ret, page_dma = 0;
 
bram_offset = omap2_onenand_bufferram_offset(mtd, area) + area + offset;
if (bram_offset & 3 || (size_t)buf & 3 || count < 384)
@@ -389,38 +390,43 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
*mtd, int area,
if (in_interrupt() || oops_in_progress)
goto out_copy;
 
+   xtra = count & 3;
+   if (xtra) {
+   count -= xtra;
+   memcpy(buf + count, this->base + bram_offset + count, xtra);
+   }
+
+   /* Handle vmalloc address */
if (buf >= high_memory) {
-   struct page *p1;
+   struct page *page;
 
if (((size_t)buf & PAGE_MASK) !=
((size_t)(buf + count - 1) & PAGE_MASK))
goto out_copy;
-   p1 = vmalloc_to_page(buf);
-   if (!p1)
+   page = vmalloc_to_page(buf);
+   if (!page)
goto out_copy;
-   buf = page_address(p1) + ((size_t)buf & ~PAGE_MASK);
-   }
-
-   xtra = count & 3;
-   if (xtra) {
-   count -= xtra;
-   memcpy(buf + count, this->base + bram_offset + count, xtra);
+   page_dma = 1;
+   dma_dst = dma_map_page(dev, page, (size_t) buf & ~PAGE_MASK,
+  count, DMA_FROM_DEVICE);
+   } else {
+   dma_dst = dma_map_single(dev, buf, count, DMA_FROM_DEVICE);
}
-
dma_src = c->phys_base + bram_offset;
-   dma_dst = dma_map_single(>pdev->dev, buf, count, DMA_FROM_DEVICE);
-   if (dma_mapping_error(>pdev->dev, dma_dst)) {
-   dev_err(>pdev->dev,
-   "Couldn't DMA map a %d byte buffer\n",
-   count);
+
+   if (dma_mapping_error(dev, dma_dst)) {
+   dev_err(dev, "Couldn't DMA map a %d byte buffer\n", count);
goto out_copy;
}
 
ret = omap2_onenand_dma_transfer(c, dma_src, dma_dst, count);
-   dma_unmap_single(>pdev->dev, dma_dst, count, DMA_FROM_DEVICE);
+   if (page_dma)
+   dma_unmap_page(dev, dma_dst, count, DMA_FROM_DEVICE);
+   else
+   dma_unmap_single(dev, dma_dst, count, DMA_FROM_DEVICE);
 
if (ret) {
-   dev_err(>pdev->dev, "timeout waiting for DMA\n");
+   dev_err(dev, "timeout waiting for DMA\n");
goto out_copy;
}
 


Re: [Letux-kernel] [Bug]: mtd: onenand: omap2plus: kernel panic with OneNAND on OMAP3 (DM3730) device GTA04A5

2018-04-10 Thread Andreas Kemnade
Hi Ladis,

On Tue, 10 Apr 2018 22:56:43 +0200
Ladislav Michl  wrote:

> Hi Nikolaus,
> 
> On Tue, Apr 10, 2018 at 06:25:17PM +0200, H. Nikolaus Schaller wrote:
> > Hi,
> > we just started testing the v4.16 kernel and found the
> > device no longer bootable (works with v4.15). It turned
> > out that there was a harmful modification somewhere between
> > v4.15.0 and v4.16-rc1.
> > 
> > A git bisect points to this patch:  
> 
> Well, that's a shame... However, this code is in production for several
> months now, so could you, please put 'goto out_copy' if 'buf >= high_memory'
> condition is met, ie:
> --- a/drivers/mtd/nand/onenand/omap2.c
> +++ b/drivers/mtd/nand/onenand/omap2.c
> @@ -392,6 +392,7 @@ static int omap2_onenand_read_bufferram(struct mtd_info 
> *mtd, int area,
>   if (buf >= high_memory) {
>   struct page *p1;
>  
> + goto out_copy;
>   if (((size_t)buf & PAGE_MASK) !=
>   ((size_t)(buf + count - 1) & PAGE_MASK))
>   goto out_copy;

I had the same problem here, and that snippet  helps here. ubiattach
-p /dev/mtdX does not cause kernel oopses here anymore


Regards,
Andreas


pgpMtvtkaSvQW.pgp
Description: OpenPGP digital signature