In the course of reviewing Rob's new keystore architecture (ksng branches in the libhal, pkcs11, and stm32 repos), I've been looking at performance of the SPI flash chip. In short, most of the poor performance is self-inflicted.
As originally coded, there were 1ms delays after every SPI operation (both transmit and receive), and an additional 10ms delay in the _wait_while_wip() loop. I was able to significantly speed up flash operations just by removing these delays, with no loss of stability or functionality. I'm not an expert, and I haven't fully re-read the 84 page data sheet for the chip, but looking at the timing diagrams, there doesn't seem to be any need for a delay between transmitting the command and receiving the result, or for a delay before de-selecting the chip, especially since the low-level SPI code has its own timeout mechanism. Maybe Fredrik can comment? For my test program, I read the entire flash chip, erase it in different ways, read again to verify erasure, write out a pattern, and read again to verify the write. Results are shown. Before: read page 131.072 sec for 65536 rounds, 2.00 ms each erase subsector 290.548 sec for 4096 rounds, 70.93 ms each erase sector 68.652 sec for 256 rounds, 268.17 ms each erase bulk 45.907 sec verify erase 131.072 sec for 65536 rounds, 2.00 ms each write page 1048.576 sec for 65536 rounds, 16.00 ms each verify write 131.072 sec for 65536 rounds, 2.00 ms each After removing delays (with speed-up factor): read page 15.681 sec for 65536 rounds, 0.23 ms each 8.4 erase subsector 238.017 sec for 4096 rounds, 58.10 ms each 1.2 erase sector 66.339 sec for 256 rounds, 259.13 ms each 1.0 erase bulk 45.907 sec 1.0 verify erase 16.073 sec for 65536 rounds, 0.24 ms each 8.2 write page 24.327 sec for 65536 rounds, 0.37 ms each 43.1 verify write 16.073 sec for 65536 rounds, 0.24 ms each 8.2 How did writing 16MB go from 17.5 min to 24 sec? It turns out that almost all the time was spent in delays. The sequence of events is like this: send WRITE_ENABLE, delay 1ms send READ_STATUS (write enable), delay 1ms send PROGRAM_PAGE, delay 1ms send data, delay 1ms send READ_STATUS (write in progress), delay 1ms WIP flag is asserted, so delay 10ms send READ_STATUS (write in progress), delay 1ms Erase shows more modest gains, because each operation takes a lot longer, so the delays are a smaller percentage. There is also some jitter in the erase timing, so there's only a 9ms difference in the sector erase in these examples. paul _______________________________________________ Tech mailing list Tech@cryptech.is https://lists.cryptech.is/listinfo/tech