Re: armv7 cache flushing: don't take shortcuts
Am 16.08.2016 um 03:03 schrieb Daniel Bolgheroni: On Mon, Aug 15, 2016 at 09:56:09PM +0200, Mark Kettenis wrote: The functions that clean/invalidate the caches by virtual address, bail out after cleaning 32k worth of data. The 32k matches the L1 cache of most of the CPUs we current run on. But the Cortex-A7 has an integrated L2 cache that is larger. And if you only flush it partially you may get into trouble. And now that we actually use the cache that matters. Many of the more recent ARMv7 CPUs include such a L2 cache. And some of them even have L1 caches that are larger than 32k. So drop the shortcut and simply clean/invalidate what we were asked to clean/invalidate. Most of the calls should be covering a single page or less anyway. This fixes the core dumps and illegal instructions that I see when booting from a SATA disk. Just saw this commited. It makes Cubieboard2 fully useable so far. Kernel rebuild with fs on ahci: Just did a complete system build on ahci on a cubieboard2 without any issues: 335m47.17s real 266m37.66s user54m34.81s system Thank you very much!
Re: armv7 cache flushing: don't take shortcuts
On Mon, Aug 15, 2016 at 09:56:09PM +0200, Mark Kettenis wrote: > The functions that clean/invalidate the caches by virtual address, > bail out after cleaning 32k worth of data. The 32k matches the L1 > cache of most of the CPUs we current run on. But the Cortex-A7 has an > integrated L2 cache that is larger. And if you only flush it > partially you may get into trouble. And now that we actually use the > cache that matters. Many of the more recent ARMv7 CPUs include such a > L2 cache. And some of them even have L1 caches that are larger than > 32k. So drop the shortcut and simply clean/invalidate what we were > asked to clean/invalidate. Most of the calls should be covering a > single page or less anyway. > > This fixes the core dumps and illegal instructions that I see when > booting from a SATA disk. Just saw this commited. It makes Cubieboard2 fully useable so far. Kernel rebuild with fs on ahci: (...) ld -T ldscript --warn-common -nopie -S -o bsd ${SYSTEM_HEAD} vers.o ${OBJS} textdatabss dec hex 3744040 139412 479308 4362760 429208 25m50.10s real17m18.26s user 1m28.06s system Just as a comparison, it takes around 20 min on Wandboard with fs on nfs and around 23 min on BeagleBone Black with fs also on nfs. Thank you. -- U-Boot SPL 2016.07 (Aug 05 2016 - 23:44:57) DRAM: 1024 MiB CPU: 91200Hz, AXI/AHB/APB: 3/2/2 Trying to boot from MMC1 U-Boot 2016.07 (Aug 05 2016 - 23:44:57 -0600) Allwinner Technology CPU: Allwinner A20 (SUN7I) Model: Cubietech Cubieboard2 I2C: ready DRAM: 1 GiB MMC: SUNXI SD/MMC: 0 *** Warning - bad CRC, using default environment In:serial Out: serial Err: serial SCSI: Target spinup took 0 ms. AHCI 0001.0100 32 slots 1 ports 3 Gbps 0x1 impl SATA mode flags: ncq stag pm led clo only pmp pio slum part ccc apst Net: eth0: ethernet@01c5 starting USB... USB0: USB EHCI 1.00 USB1: USB OHCI 1.0 USB2: USB EHCI 1.00 USB3: USB OHCI 1.0 scanning bus 0 for devices... 1 USB Device(s) found scanning bus 2 for devices... 1 USB Device(s) found Hit any key to stop autoboot: 0 => => setenv devnum 0 => run scsi_boot scanning bus for devices... Device 0: (0:0) Vendor: ATA Prod.: TOSHIBA MK1235GS Rev: PV01 Type: Hard Disk Capacity: 114473.4 MB = 111.7 GB (234441648 x 512) Found 1 device(s). Device 0: (0:0) Vendor: ATA Prod.: TOSHIBA MK1235GS Rev: PV01 Type: Hard Disk Capacity: 114473.4 MB = 111.7 GB (234441648 x 512) ... is now current device Scanning scsi 0:1... Found EFI removable media binary efi/boot/bootarm.efi reading efi/boot/bootarm.efi 65276 bytes read in 23 ms (2.7 MiB/s) libfdt fdt_check_header(): FDT_ERR_BADMAGIC ## Starting EFI application at 0x4200 ... Scanning disks on scsi... Scanning disks on usb... Scanning disks on mmc... MMC Device 1 not found MMC Device 2 not found MMC Device 3 not found Found 6 disks >> OpenBSD/armv7 BOOTARM 0.1 boot> booting sd0a:/bsd: 3743840+139408+479308 [64+501824+238352]=0x4e3de0 OpenBSD/armv7 booting ... arg0 0x4000 arg1 0x10bb arg2 0x4800 Allocating page tables freestart = 0x407e4000, free_pages = 260124 (0x0003f81c) IRQ stack: p0x40812000 v0xc0812000 ABT stack: p0x40813000 v0xc0813000 UND stack: p0x40814000 v0xc0814000 SVC stack: p0x40815000 v0xc0815000 Creating L1 page table at 0x407e4000 Mapping kernel Constructing L2 page tables undefined page pmap [ using 740612 bytes of bsd ELF symbol table ] board type: 4283 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2016 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 6.0-current (GENERIC) #1: Mon Aug 15 19:34:05 BRT 2016 dbolgher...@wbs.my.domain:/usr/src/sys/arch/armv7/compile/GENERIC real mem = 1073741824 (1024MB) avail mem = 104448 (996MB) mainbus0 at root: Cubietech Cubieboard2 cpu0 at mainbus0: ARM Cortex A7 rev 4 (ARMv7 core) cpu0: DC enabled IC enabled WB disabled EABT branch prediction enabled cpu0: 32KB(32b/l,2way) I-cache, 32KB(64b/l,4way) wr-back D-cache cortex0 at mainbus0 sunxi0 at mainbus0 sxipio0 at sunxi0: 175 pins sxiccmu0 at sunxi0 gpio0 at sxipio0: 18 pins gpio1 at sxipio0: 24 pins gpio2 at sxipio0: 25 pins gpio3 at sxipio0: 28 pins gpio4 at sxipio0: 12 pins gpio5 at sxipio0: 6 pins gpio6 at sxipio0: 12 pins gpio7 at sxipio0: 28 pins gpio8 at sxipio0: 22 pins agtimer0 at mainbus0: tick rate 24000 KHz simplebus0 at mainbus0: "soc" ehci0 at simplebus0 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Allwinner EHCI root hub" rev 2.00/1.00 addr 1 sxiahci0 at simplebus0: AHCI 1.1 sxiahci0: port 0: 3.0Gb/s scsibus0 at sxiahci0: 32 targets sd0 at scsibus0 targ 0 lun 0: SCSI3 0/direct fixed naa.5391d4f841be sd0: 114473MB, 512 bytes/sector, 234441648 sectors ehci1 at simplebus0 usb1 at ehci1: USB revision 2.0 uhub1 at usb1 "Allwinner EHCI root hub" rev 2.00/1.00 addr 1 sxidog0 at simplebus0 sxirtc0 at simplebus0 sxiuart0 at simplebus0: console dw
armv7 cache flushing: don't take shortcuts
The functions that clean/invalidate the caches by virtual address, bail out after cleaning 32k worth of data. The 32k matches the L1 cache of most of the CPUs we current run on. But the Cortex-A7 has an integrated L2 cache that is larger. And if you only flush it partially you may get into trouble. And now that we actually use the cache that matters. Many of the more recent ARMv7 CPUs include such a L2 cache. And some of them even have L1 caches that are larger than 32k. So drop the shortcut and simply clean/invalidate what we were asked to clean/invalidate. Most of the calls should be covering a single page or less anyway. This fixes the core dumps and illegal instructions that I see when booting from a SATA disk. ok? Index: arch/arm/arm/cpufunc_asm_armv7.S === RCS file: /cvs/src/sys/arch/arm/arm/cpufunc_asm_armv7.S,v retrieving revision 1.13 diff -u -p -r1.13 cpufunc_asm_armv7.S --- arch/arm/arm/cpufunc_asm_armv7.S6 Aug 2016 16:46:25 - 1.13 +++ arch/arm/arm/cpufunc_asm_armv7.S15 Aug 2016 19:45:53 - @@ -103,8 +103,6 @@ ENTRY(armv7_tlb_flushD) i_inc .req r3 ENTRY(armv7_icache_sync_range) ldr ip, .Larmv7_icache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -136,8 +134,6 @@ ENTRY(armv7_icache_sync_all) ENTRY(armv7_dcache_wb_range) ldr ip, .Larmv7_dcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -155,8 +151,6 @@ ENTRY(armv7_dcache_wb_range) ENTRY(armv7_idcache_wbinv_range) ldr ip, .Larmv7_idcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -177,8 +171,6 @@ ENTRY(armv7_idcache_wbinv_range) ENTRY(armv7_dcache_wbinv_range) ldr ip, .Larmv7_dcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -198,8 +190,6 @@ ENTRY(armv7_dcache_wbinv_range) ENTRY(armv7_dcache_inv_range) ldr ip, .Larmv7_dcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1