Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Liang, On Thu, Apr 11, 2019 at 5:00 AM Liang Yang wrote: > > Hi Martin, > On 2019/4/11 1:54, Martin Blumenstingl wrote: > > Hi Liang, > > > > On Wed, Apr 10, 2019 at 1:08 PM Liang Yang wrote: > >> > >> Hi Martin, > >> > >> On 2019/4/5 12:30, Martin Blumenstingl wrote: > >>> Hi Liang, > >>> > >>> On Fri, Mar 29, 2019 at 8:44 AM Liang Yang wrote: > > Hi Martin, > > On 2019/3/29 2:03, Martin Blumenstingl wrote: > > Hi Liang, > [..] > >> I don't think it is caused by a different NAND type, but i have > >> followed > >> the some test on my GXL platform. we can see the result from the > >> attachment. By the way, i don't find any information about this on > >> meson > >> NFC datasheet, so i will ask our VLSI. > >> Martin, May you reproduce it with the new patch on meson8b platform ? I > >> need a more clear and easier compared log like gxl.txt. Thanks. > > your gxl.txt is great, finally I can also compare my own results with > > something that works for you! > > in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" > > instructions result in a different info buffer output. > > does this make any sense to you? > > > I have asked our VLSI designer for explanation or simulation result by > an e-mail. Thanks. > >>> do you have any update on this? > >> Sorry. I haven't got reply from VLSI designer yet. We tried to improve > >> priority yesterday, but i still can't estimate the time. There is no > >> document or change list showing the difference between m8/b and gxl/axg > >> serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand > >> initialization for m8/b chips and use *read byte from NFC fifo register* > >> instead. > > thank you for the status update! > > > > I am trying to understand your suggestion not to use NFC_CMD_N2M: > > the documentation (public S922X datasheet from Hardkernel: [0]) states > > that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to > > four bytes of data. is this the "read byte from NFC FIFO register" you > > mentioned? > > > You are right.take the early meson NFC driver V2 on previous mail as a > reference. > > > Before I spend time changing the code to use the FIFO register I would > > like to wait for an answer from your VLSI designer. > > Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit > > SoCs seems like an easier solution compared to switching to the FIFO > > register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to > > have only one code-path for 32 and 64 bit SoCs, meaning we don't have > > to maintain two separate code-paths for basically the same > > functionality (assuming that NFC_CMD_N2M is not completely broken on > > the 32-bit SoCs, we just don't know how to use it yet). > > > All right. I am also waiting for the answer. do you have any update on this? Martin
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Martin, On 2019/4/11 1:54, Martin Blumenstingl wrote: Hi Liang, On Wed, Apr 10, 2019 at 1:08 PM Liang Yang wrote: Hi Martin, On 2019/4/5 12:30, Martin Blumenstingl wrote: Hi Liang, On Fri, Mar 29, 2019 at 8:44 AM Liang Yang wrote: Hi Martin, On 2019/3/29 2:03, Martin Blumenstingl wrote: Hi Liang, [..] I don't think it is caused by a different NAND type, but i have followed the some test on my GXL platform. we can see the result from the attachment. By the way, i don't find any information about this on meson NFC datasheet, so i will ask our VLSI. Martin, May you reproduce it with the new patch on meson8b platform ? I need a more clear and easier compared log like gxl.txt. Thanks. your gxl.txt is great, finally I can also compare my own results with something that works for you! in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" instructions result in a different info buffer output. does this make any sense to you? I have asked our VLSI designer for explanation or simulation result by an e-mail. Thanks. do you have any update on this? Sorry. I haven't got reply from VLSI designer yet. We tried to improve priority yesterday, but i still can't estimate the time. There is no document or change list showing the difference between m8/b and gxl/axg serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand initialization for m8/b chips and use *read byte from NFC fifo register* instead. thank you for the status update! I am trying to understand your suggestion not to use NFC_CMD_N2M: the documentation (public S922X datasheet from Hardkernel: [0]) states that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to four bytes of data. is this the "read byte from NFC FIFO register" you mentioned? You are right.take the early meson NFC driver V2 on previous mail as a reference. Before I spend time changing the code to use the FIFO register I would like to wait for an answer from your VLSI designer. Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit SoCs seems like an easier solution compared to switching to the FIFO register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to have only one code-path for 32 and 64 bit SoCs, meaning we don't have to maintain two separate code-paths for basically the same functionality (assuming that NFC_CMD_N2M is not completely broken on the 32-bit SoCs, we just don't know how to use it yet). All right. I am also waiting for the answer. Regards Martin [0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf .
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Liang, On Wed, Apr 10, 2019 at 1:08 PM Liang Yang wrote: > > Hi Martin, > > On 2019/4/5 12:30, Martin Blumenstingl wrote: > > Hi Liang, > > > > On Fri, Mar 29, 2019 at 8:44 AM Liang Yang wrote: > >> > >> Hi Martin, > >> > >> On 2019/3/29 2:03, Martin Blumenstingl wrote: > >>> Hi Liang, > >> [..] > I don't think it is caused by a different NAND type, but i have followed > the some test on my GXL platform. we can see the result from the > attachment. By the way, i don't find any information about this on meson > NFC datasheet, so i will ask our VLSI. > Martin, May you reproduce it with the new patch on meson8b platform ? I > need a more clear and easier compared log like gxl.txt. Thanks. > >>> your gxl.txt is great, finally I can also compare my own results with > >>> something that works for you! > >>> in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" > >>> instructions result in a different info buffer output. > >>> does this make any sense to you? > >>> > >> I have asked our VLSI designer for explanation or simulation result by > >> an e-mail. Thanks. > > do you have any update on this? > Sorry. I haven't got reply from VLSI designer yet. We tried to improve > priority yesterday, but i still can't estimate the time. There is no > document or change list showing the difference between m8/b and gxl/axg > serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand > initialization for m8/b chips and use *read byte from NFC fifo register* > instead. thank you for the status update! I am trying to understand your suggestion not to use NFC_CMD_N2M: the documentation (public S922X datasheet from Hardkernel: [0]) states that P_NAND_BUF (NFC_REG_BUF in the meson_nand driver) can hold up to four bytes of data. is this the "read byte from NFC FIFO register" you mentioned? Before I spend time changing the code to use the FIFO register I would like to wait for an answer from your VLSI designer. Setting the "correct" info buffer length for NFC_CMD_N2M on the 32-bit SoCs seems like an easier solution compared to switching to the FIFO register. Keeping NFC_CMD_N2M on the 32-bit SoCs also allows us to have only one code-path for 32 and 64 bit SoCs, meaning we don't have to maintain two separate code-paths for basically the same functionality (assuming that NFC_CMD_N2M is not completely broken on the 32-bit SoCs, we just don't know how to use it yet). Regards Martin [0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Martin, On 2019/4/5 12:30, Martin Blumenstingl wrote: Hi Liang, On Fri, Mar 29, 2019 at 8:44 AM Liang Yang wrote: Hi Martin, On 2019/3/29 2:03, Martin Blumenstingl wrote: Hi Liang, [..] I don't think it is caused by a different NAND type, but i have followed the some test on my GXL platform. we can see the result from the attachment. By the way, i don't find any information about this on meson NFC datasheet, so i will ask our VLSI. Martin, May you reproduce it with the new patch on meson8b platform ? I need a more clear and easier compared log like gxl.txt. Thanks. your gxl.txt is great, finally I can also compare my own results with something that works for you! in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" instructions result in a different info buffer output. does this make any sense to you? I have asked our VLSI designer for explanation or simulation result by an e-mail. Thanks. do you have any update on this? Sorry. I haven't got reply from VLSI designer yet. We tried to improve priority yesterday, but i still can't estimate the time. There is no document or change list showing the difference between m8/b and gxl/axg serial chips. Now it seems that we can't use command NFC_CMD_N2M on nand initialization for m8/b chips and use *read byte from NFC fifo register* instead. Martin .
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Liang, On Fri, Mar 29, 2019 at 8:44 AM Liang Yang wrote: > > Hi Martin, > > On 2019/3/29 2:03, Martin Blumenstingl wrote: > > Hi Liang, > [..] > >> I don't think it is caused by a different NAND type, but i have followed > >> the some test on my GXL platform. we can see the result from the > >> attachment. By the way, i don't find any information about this on meson > >> NFC datasheet, so i will ask our VLSI. > >> Martin, May you reproduce it with the new patch on meson8b platform ? I > >> need a more clear and easier compared log like gxl.txt. Thanks. > > your gxl.txt is great, finally I can also compare my own results with > > something that works for you! > > in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" > > instructions result in a different info buffer output. > > does this make any sense to you? > > > I have asked our VLSI designer for explanation or simulation result by > an e-mail. Thanks. do you have any update on this? Martin
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Martin, On 2019/3/29 2:03, Martin Blumenstingl wrote: Hi Liang, [..] I don't think it is caused by a different NAND type, but i have followed the some test on my GXL platform. we can see the result from the attachment. By the way, i don't find any information about this on meson NFC datasheet, so i will ask our VLSI. Martin, May you reproduce it with the new patch on meson8b platform ? I need a more clear and easier compared log like gxl.txt. Thanks. your gxl.txt is great, finally I can also compare my own results with something that works for you! in my results (see attachment) the "DATA_IN [256 B, force 8-bit]" instructions result in a different info buffer output. does this make any sense to you? I have asked our VLSI designer for explanation or simulation result by an e-mail. Thanks.
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Liang, On Wed, Mar 27, 2019 at 9:52 AM Liang Yang wrote: > > Hi Martin, > > Thanks a lot. > On 2019/3/26 2:31, Martin Blumenstingl wrote: > > Hi Liang, > > > > On Mon, Mar 25, 2019 at 11:03 AM Liang Yang wrote: > >> > >> Hi Martin, > >> > >> On 2019/3/23 5:07, Martin Blumenstingl wrote: > >>> Hi Matthew, > >>> > >>> On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox > >>> wrote: > > On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: > > Hello, > > > > I am experiencing the following crash: > > [ cut here ] > > kernel BUG at mm/slub.c:3950! > > if (unlikely(!PageSlab(page))) { > BUG_ON(!PageCompound(page)); > > You called kfree() on the address of a page which wasn't allocated by > slab. > > > I have traced this crash to the kfree() in meson_nfc_read_buf(). > > my observation is as follows: > > - meson_nfc_read_buf() is called 7 times without any crash, the > > kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 > > (physical address) > > - the eight time meson_nfc_read_buf() is called kzalloc() call returns > > 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the > > final kfree() crashes > > - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to > > PAGE_SIZE works around that crash > > I suspect you're doing something which corrupts memory. Overrunning > the end of your allocation or something similar. Have you tried KASAN > or even the various slab debugging (eg redzones)? > >>> KASAN is not available on 32-bit ARM. there was some progress last > >>> year [0] but it didn't make it into mainline. I tried to make the > >>> patches apply again and got it to compile (and my kernel is still > >>> booting) but I have no idea if it's still working. for anyone > >>> interested, my patches are here: [1] (I consider this a HACK because I > >>> don't know anything about the code which is being touched in the > >>> patches, I only made it compile) > >>> > >>> SLAB debugging (redzones) were a great hint, thank you very much for > >>> that Matthew! I enabled: > >>> CONFIG_SLUB_DEBUG=y > >>> CONFIG_SLUB_DEBUG_ON=y > >>> and with that I now get "BUG kmalloc-64 (Not tainted): Redzone > >>> overwritten" (a larger kernel log extract is attached). > >>> > >>> I'm starting to wonder if the NAND controller (hardware) writes more > >>> than 8 bytes. > >>> some context: the "info" buffer allocated in meson_nfc_read_buf is > >>> then passed to the NAND controller IP (after using dma_map_single). > >>> > >>> Liang, how does the NAND controller know that it only has to send > >>> PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all > >>> other callers of meson_nfc_dma_buffer_setup (which passes the info > >>> buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) > >>> bytes? > >>> > >> NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set > >> the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so > >> PER_INFO_BYTE(= 8) bytes for each ecc page. > >> I have never used NFC_CMD_N2M to transfer data before, because it is > >> very low efficient. And I do a experiment with the attachment and find > >> on overwritten on my meson axg platform. > >> > >> Martin, I would appreciate it very much if you would try the attachment > >> on your meson m8b platform. > > thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough. > > I took the idea from your patch and adapted it so I could print a > > buffer with 256 bytes (which seems to be "big enough" for my board). > it only needs PER_INFO_BYTE (= 8) bytes, because NFC_CMD_N2M don't set > *Pages*, that is not like CMDRWGEN which needs Pages*PER_INFO_BYTE (= 8) > bytes when setting *Pages* parameter. I have been thinking that > NFC_CMD_N2M only occupis PER_INFO_BYTE (= 8) bytes. And i have tried to > not set the info address, the machine would crash. thank you for the explanation. the command is built using: cmd = NFC_CMD_N2M | (len & GENMASK(5, 0)); > > see the attached, modified patch > > > > in the output I see that sometimes the first 32 bytes are not touched > > by the controller, but everything beyond 32 bytes is modified in the > > info buffer. > > > it really makes sense that the controller sometimes fills the space > beyond the first 8 bytes. However i expect the controller should only > take the first 8 bytes when using NFC_CMD_N2M. in my tests (see the attached log output) it seems that the info buffer size has the following constraints: - use the "len" which is passed to meson_nfc_read_buf - if "len" is smaller than PER_INFO_BYTE then use PER_INFO_BYTE (= 8) > > I also tried to increase the buffer size to 512, but that didn't make > > a difference (I never saw any info buffer modification beyond 256 > > bytes). > > > > also I just noticed that I didn
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Martin, Thanks a lot. On 2019/3/26 2:31, Martin Blumenstingl wrote: Hi Liang, On Mon, Mar 25, 2019 at 11:03 AM Liang Yang wrote: Hi Martin, On 2019/3/23 5:07, Martin Blumenstingl wrote: Hi Matthew, On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: Hello, I am experiencing the following crash: [ cut here ] kernel BUG at mm/slub.c:3950! if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); You called kfree() on the address of a page which wasn't allocated by slab. I have traced this crash to the kfree() in meson_nfc_read_buf(). my observation is as follows: - meson_nfc_read_buf() is called 7 times without any crash, the kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 (physical address) - the eight time meson_nfc_read_buf() is called kzalloc() call returns 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the final kfree() crashes - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to PAGE_SIZE works around that crash I suspect you're doing something which corrupts memory. Overrunning the end of your allocation or something similar. Have you tried KASAN or even the various slab debugging (eg redzones)? KASAN is not available on 32-bit ARM. there was some progress last year [0] but it didn't make it into mainline. I tried to make the patches apply again and got it to compile (and my kernel is still booting) but I have no idea if it's still working. for anyone interested, my patches are here: [1] (I consider this a HACK because I don't know anything about the code which is being touched in the patches, I only made it compile) SLAB debugging (redzones) were a great hint, thank you very much for that Matthew! I enabled: CONFIG_SLUB_DEBUG=y CONFIG_SLUB_DEBUG_ON=y and with that I now get "BUG kmalloc-64 (Not tainted): Redzone overwritten" (a larger kernel log extract is attached). I'm starting to wonder if the NAND controller (hardware) writes more than 8 bytes. some context: the "info" buffer allocated in meson_nfc_read_buf is then passed to the NAND controller IP (after using dma_map_single). Liang, how does the NAND controller know that it only has to send PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all other callers of meson_nfc_dma_buffer_setup (which passes the info buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) bytes? NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so PER_INFO_BYTE(= 8) bytes for each ecc page. I have never used NFC_CMD_N2M to transfer data before, because it is very low efficient. And I do a experiment with the attachment and find on overwritten on my meson axg platform. Martin, I would appreciate it very much if you would try the attachment on your meson m8b platform. thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough. I took the idea from your patch and adapted it so I could print a buffer with 256 bytes (which seems to be "big enough" for my board). it only needs PER_INFO_BYTE (= 8) bytes, because NFC_CMD_N2M don't set *Pages*, that is not like CMDRWGEN which needs Pages*PER_INFO_BYTE (= 8) bytes when setting *Pages* parameter. I have been thinking that NFC_CMD_N2M only occupis PER_INFO_BYTE (= 8) bytes. And i have tried to not set the info address, the machine would crash. see the attached, modified patch in the output I see that sometimes the first 32 bytes are not touched by the controller, but everything beyond 32 bytes is modified in the info buffer. it really makes sense that the controller sometimes fills the space beyond the first 8 bytes. However i expect the controller should only take the first 8 bytes when using NFC_CMD_N2M. I also tried to increase the buffer size to 512, but that didn't make a difference (I never saw any info buffer modification beyond 256 bytes). also I just noticed that I didn't give you much details on my NAND chip yet. from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have eMMC flash, but I believe the NAND controller on Meson8 to GXBB is identical): m8m2_n200_v1#amlnf chipinfo flash info name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0 pagesize:0x4000, blocksize:0x40, oobsize:0x500, chipsize:0x2000, option:0x8, T_REA:16, T_RHOH:15 hw controller info chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2 ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40 bch_mode:5, user_mode:2, oobavail:32, oobtail:64384 I don't think it is caused by a different NAND type, but i have followed the some test on my GXL platform. we can see the result from the attachment. By the way, i don't find any information about this on meson NFC datasheet, so i will ask our VLSI. Martin, May you reproduce it with t
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Liang, On Mon, Mar 25, 2019 at 11:03 AM Liang Yang wrote: > > Hi Martin, > > On 2019/3/23 5:07, Martin Blumenstingl wrote: > > Hi Matthew, > > > > On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: > >> > >> On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: > >>> Hello, > >>> > >>> I am experiencing the following crash: > >>>[ cut here ] > >>>kernel BUG at mm/slub.c:3950! > >> > >> if (unlikely(!PageSlab(page))) { > >> BUG_ON(!PageCompound(page)); > >> > >> You called kfree() on the address of a page which wasn't allocated by slab. > >> > >>> I have traced this crash to the kfree() in meson_nfc_read_buf(). > >>> my observation is as follows: > >>> - meson_nfc_read_buf() is called 7 times without any crash, the > >>> kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 > >>> (physical address) > >>> - the eight time meson_nfc_read_buf() is called kzalloc() call returns > >>> 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the > >>> final kfree() crashes > >>> - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to > >>> PAGE_SIZE works around that crash > >> > >> I suspect you're doing something which corrupts memory. Overrunning > >> the end of your allocation or something similar. Have you tried KASAN > >> or even the various slab debugging (eg redzones)? > > KASAN is not available on 32-bit ARM. there was some progress last > > year [0] but it didn't make it into mainline. I tried to make the > > patches apply again and got it to compile (and my kernel is still > > booting) but I have no idea if it's still working. for anyone > > interested, my patches are here: [1] (I consider this a HACK because I > > don't know anything about the code which is being touched in the > > patches, I only made it compile) > > > > SLAB debugging (redzones) were a great hint, thank you very much for > > that Matthew! I enabled: > >CONFIG_SLUB_DEBUG=y > >CONFIG_SLUB_DEBUG_ON=y > > and with that I now get "BUG kmalloc-64 (Not tainted): Redzone > > overwritten" (a larger kernel log extract is attached). > > > > I'm starting to wonder if the NAND controller (hardware) writes more > > than 8 bytes. > > some context: the "info" buffer allocated in meson_nfc_read_buf is > > then passed to the NAND controller IP (after using dma_map_single). > > > > Liang, how does the NAND controller know that it only has to send > > PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all > > other callers of meson_nfc_dma_buffer_setup (which passes the info > > buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) > > bytes? > > > NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set > the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so > PER_INFO_BYTE(= 8) bytes for each ecc page. > I have never used NFC_CMD_N2M to transfer data before, because it is > very low efficient. And I do a experiment with the attachment and find > on overwritten on my meson axg platform. > > Martin, I would appreciate it very much if you would try the attachment > on your meson m8b platform. thank you for your debug patch! on my board 2 * PER_INFO_BYTE is not enough. I took the idea from your patch and adapted it so I could print a buffer with 256 bytes (which seems to be "big enough" for my board). see the attached, modified patch in the output I see that sometimes the first 32 bytes are not touched by the controller, but everything beyond 32 bytes is modified in the info buffer. I also tried to increase the buffer size to 512, but that didn't make a difference (I never saw any info buffer modification beyond 256 bytes). also I just noticed that I didn't give you much details on my NAND chip yet. from Amlogic vendor u-boot on Meson8m2 (all my Meson8b boards have eMMC flash, but I believe the NAND controller on Meson8 to GXBB is identical): m8m2_n200_v1#amlnf chipinfo flash info name:B revision 20nm NAND 8GiB H27UCG8T2B, id:ad de 94 eb 74 44 0 0 pagesize:0x4000, blocksize:0x40, oobsize:0x500, chipsize:0x2000, option:0x8, T_REA:16, T_RHOH:15 hw controller info chip_num:1, onfi_mode:0, page_shift:14, block_shift:22, option:0xc2 ecc_unit:1024, ecc_bytes:70, ecc_steps:16, ecc_max:40 bch_mode:5, user_mode:2, oobavail:32, oobtail:64384 Regards Martin ... [2.716885] : 8005 2800 2945 fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd [2.720464] 0020: fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd [2.729689] 0040: fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd [2.738847] 0060: fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd [2.748065] 0080: fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd [2.757228] 00a0: fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdfd fdf
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Martin, On 2019/3/23 5:07, Martin Blumenstingl wrote: Hi Matthew, On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: Hello, I am experiencing the following crash: [ cut here ] kernel BUG at mm/slub.c:3950! if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); You called kfree() on the address of a page which wasn't allocated by slab. I have traced this crash to the kfree() in meson_nfc_read_buf(). my observation is as follows: - meson_nfc_read_buf() is called 7 times without any crash, the kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 (physical address) - the eight time meson_nfc_read_buf() is called kzalloc() call returns 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the final kfree() crashes - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to PAGE_SIZE works around that crash I suspect you're doing something which corrupts memory. Overrunning the end of your allocation or something similar. Have you tried KASAN or even the various slab debugging (eg redzones)? KASAN is not available on 32-bit ARM. there was some progress last year [0] but it didn't make it into mainline. I tried to make the patches apply again and got it to compile (and my kernel is still booting) but I have no idea if it's still working. for anyone interested, my patches are here: [1] (I consider this a HACK because I don't know anything about the code which is being touched in the patches, I only made it compile) SLAB debugging (redzones) were a great hint, thank you very much for that Matthew! I enabled: CONFIG_SLUB_DEBUG=y CONFIG_SLUB_DEBUG_ON=y and with that I now get "BUG kmalloc-64 (Not tainted): Redzone overwritten" (a larger kernel log extract is attached). I'm starting to wonder if the NAND controller (hardware) writes more than 8 bytes. some context: the "info" buffer allocated in meson_nfc_read_buf is then passed to the NAND controller IP (after using dma_map_single). Liang, how does the NAND controller know that it only has to send PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all other callers of meson_nfc_dma_buffer_setup (which passes the info buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) bytes? NFC_CMD_N2M and CMDRWGEN are different commands. CMDRWGEN needs to set the ecc page size (1KB or 512B) and Pages(2, 4, 8, ...), so PER_INFO_BYTE(= 8) bytes for each ecc page. I have never used NFC_CMD_N2M to transfer data before, because it is very low efficient. And I do a experiment with the attachment and find on overwritten on my meson axg platform. Martin, I would appreciate it very much if you would try the attachment on your meson m8b platform. Regards Martin [0] https://lore.kernel.org/patchwork/cover/913212/ [1] https://github.com/xdarklight/linux/tree/arm-kasan-hack-v5.1-rc1 diff --git a/drivers/mtd/nand/raw/meson_nand.c b/drivers/mtd/nand/raw/meson_nand.c old mode 100644 new mode 100755 index e858d58..905ef39 --- a/drivers/mtd/nand/raw/meson_nand.c +++ b/drivers/mtd/nand/raw/meson_nand.c @@ -527,11 +527,12 @@ static void meson_nfc_dma_buffer_release(struct nand_chip *nand, static int meson_nfc_read_buf(struct nand_chip *nand, u8 *buf, int len) { struct meson_nfc *nfc = nand_get_controller_data(nand); - int ret = 0; + int ret = 0, i; u32 cmd; u8 *info; - info = kzalloc(PER_INFO_BYTE, GFP_KERNEL); + info = kzalloc(2 * PER_INFO_BYTE, GFP_KERNEL); + memset(info, 0xFD, 2 * PER_INFO_BYTE); ret = meson_nfc_dma_buffer_setup(nand, buf, len, info, PER_INFO_BYTE, DMA_FROM_DEVICE); if (ret) @@ -543,6 +544,12 @@ static int meson_nfc_read_buf(struct nand_chip *nand, u8 *buf, int len) meson_nfc_drain_cmd(nfc); meson_nfc_wait_cmd_finish(nfc, 1000); meson_nfc_dma_buffer_release(nand, len, PER_INFO_BYTE, DMA_FROM_DEVICE); + + for (i = 0; i < 2 * PER_INFO_BYTE; i++){ + printk("0x%x ", info[i]); + } + printk("\n"); + kfree(info); return ret;
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hi Matthew, On Thu, Mar 21, 2019 at 10:44 PM Matthew Wilcox wrote: > > On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: > > Hello, > > > > I am experiencing the following crash: > > [ cut here ] > > kernel BUG at mm/slub.c:3950! > > if (unlikely(!PageSlab(page))) { > BUG_ON(!PageCompound(page)); > > You called kfree() on the address of a page which wasn't allocated by slab. > > > I have traced this crash to the kfree() in meson_nfc_read_buf(). > > my observation is as follows: > > - meson_nfc_read_buf() is called 7 times without any crash, the > > kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 > > (physical address) > > - the eight time meson_nfc_read_buf() is called kzalloc() call returns > > 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the > > final kfree() crashes > > - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to > > PAGE_SIZE works around that crash > > I suspect you're doing something which corrupts memory. Overrunning > the end of your allocation or something similar. Have you tried KASAN > or even the various slab debugging (eg redzones)? KASAN is not available on 32-bit ARM. there was some progress last year [0] but it didn't make it into mainline. I tried to make the patches apply again and got it to compile (and my kernel is still booting) but I have no idea if it's still working. for anyone interested, my patches are here: [1] (I consider this a HACK because I don't know anything about the code which is being touched in the patches, I only made it compile) SLAB debugging (redzones) were a great hint, thank you very much for that Matthew! I enabled: CONFIG_SLUB_DEBUG=y CONFIG_SLUB_DEBUG_ON=y and with that I now get "BUG kmalloc-64 (Not tainted): Redzone overwritten" (a larger kernel log extract is attached). I'm starting to wonder if the NAND controller (hardware) writes more than 8 bytes. some context: the "info" buffer allocated in meson_nfc_read_buf is then passed to the NAND controller IP (after using dma_map_single). Liang, how does the NAND controller know that it only has to send PER_INFO_BYTE (= 8) bytes when called from meson_nfc_read_buf? all other callers of meson_nfc_dma_buffer_setup (which passes the info buffer to the hardware) are using (nand->ecc.steps * PER_INFO_BYTE) bytes? Regards Martin [0] https://lore.kernel.org/patchwork/cover/913212/ [1] https://github.com/xdarklight/linux/tree/arm-kasan-hack-v5.1-rc1 [2.742070] meson_nfc_read_buf e95e7d00 0x295e7d00 [2.742155] meson_nfc_read_buf e95e7d00 0x295e7d00 [2.746056] meson_nfc_read_buf e95e62c0 0x295e62c0 [2.750947] meson_nfc_read_buf e95e7d00 0x295e7d00 [2.755530] = [2.763673] BUG kmalloc-64 (Not tainted): Redzone overwritten [2.769392] - [2.769392] [2.779013] Disabling lock debugging due to kernel taint [2.784303] INFO: 0x(ptrval)-0x(ptrval). First byte 0xff instead of 0xcc [2.790982] INFO: Allocated in 0x age=4294937574 cpu=4294967295 pid=-1 [2.798171] 0x [2.800598] 0x [2.803024] 0x [2.805451] 0x [2.807879] 0x [2.810306] 0x [2.812733] 0x [2.815160] 0x [2.817587] 0x [2.820014] 0x [2.822441] 0x [2.824869] 0x [2.827296] 0x [2.829722] 0x [2.832150] 0x [2.834577] 0x [2.837006] INFO: Freed in 0x age=4294937574 cpu=4294967295 pid=-1 [2.843852] 0x [2.846279] 0x [2.848706] 0x [2.851133] 0x [2.853560] 0x [2.855987] 0x [2.858414] 0x [2.860842] 0x [2.863269] 0x [2.865696] 0x [2.868123] 0x [2.870550] 0x [2.872977] 0x [2.875404] 0x [2.877831] 0x [2.880258] 0x [2.882687] INFO: Slab 0x(ptrval) objects=25 used=4 fp=0x(ptrval) flags=0x10201 [2.889968] INFO: Object 0x(ptrval) @offset=7424 fp=0x(ptrval) [2.889968] [2.897251] Redzone (ptrval): cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [2.905917] Redzone (ptrval): cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [2.914585] Redzone (ptrval): cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [2.923253] Redzone (ptrval): cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc [2.931922] Object (ptrval): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [2.940503] Object (ptrval): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [2.949085] Object (ptrval): ff ff ff ff ff ff ff ff ff ff f
Re: 32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
On Thu, Mar 21, 2019 at 09:17:34PM +0100, Martin Blumenstingl wrote: > Hello, > > I am experiencing the following crash: > [ cut here ] > kernel BUG at mm/slub.c:3950! if (unlikely(!PageSlab(page))) { BUG_ON(!PageCompound(page)); You called kfree() on the address of a page which wasn't allocated by slab. > I have traced this crash to the kfree() in meson_nfc_read_buf(). > my observation is as follows: > - meson_nfc_read_buf() is called 7 times without any crash, the > kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 > (physical address) > - the eight time meson_nfc_read_buf() is called kzalloc() call returns > 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the > final kfree() crashes > - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to > PAGE_SIZE works around that crash I suspect you're doing something which corrupts memory. Overrunning the end of your allocation or something similar. Have you tried KASAN or even the various slab debugging (eg redzones)?
32-bit Amlogic (ARM) SoC: kernel BUG in kfree()
Hello, I am experiencing the following crash: [ cut here ] kernel BUG at mm/slub.c:3950! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM Modules linked in: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-00080-g37b8cb064293-dirty #4252 Hardware name: Amlogic Meson platform PC is at kfree+0x250/0x274 LR is at meson_nfc_exec_op+0x3b0/0x408 ... my goal is to add support for the 32-bit Amlogic Meson SoCs (ARM Cortex-A5 / Cortex-A9 cores) in the meson-nand driver. I have traced this crash to the kfree() in meson_nfc_read_buf(). my observation is as follows: - meson_nfc_read_buf() is called 7 times without any crash, the kzalloc() call returns 0xe9e6c600 (virtual address) / 0x29e6c600 (physical address) - the eight time meson_nfc_read_buf() is called kzalloc() call returns 0xee39a38b (virtual address) / 0x2e39a38b (physical address) and the final kfree() crashes - changing the size in the kzalloc() call from PER_INFO_BYTE (= 8) to PAGE_SIZE works around that crash - disabling the meson-nand driver makes my board boot just fine - Liang has tested the unmodified code on a 64-bit Amlogic SoC (ARM Cortex-A53 cores) and he doesn't see the crash there in case the selected SLAB allocator is relevant: CONFIG_SLUB=y the following printk statement is used to print the addresses returned by the kzalloc() call in meson_nfc_read_buf(): printk("%s 0x%px 0x%08x\n", __func__, info, virt_to_phys(info)); my questions are: - why does kzalloc() return an unaligned address 0xee39a38b (virtual address) / 0x2e39a38b (physical address)? - how can further analyze this issue? - (I don't know where to start analyzing: in mm/, arch/arm/mm, the meson-nand driver seems to work fine on the 64-bit SoCs but that doesn't fully rule it out, ...) Regards Martin