Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
> "Matthias" == Matthias Welwarskywrites: Matthias> You will always have the best performance if you carefully Matthias> tweak your write speed so that you reach optimum performance Matthias> without ever producing a WAIT response. HLA adapters or Matthias> CMSIS-DAP with SWD have a clear advantage here, because they Matthias> can do a proper synchronous WAIT at the JTAG or SWD link layer Matthias> and not up in the ADI protocol. Hello, tweaking non-HLA Adapter sounds very heuristic. Provide long enough wait times between flash write to satify the maximum datasheet flash write delay. This is the deterministic way. That way is not as fast as tweaking, but OpenOCD has already much too many parameters to tweak... Bye -- Uwe Bonnesb...@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt - Tel. 06151 1623569 --- Fax. 06151 1623305 - -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
Yes, I used the default setting of 8000. However I think I have now figured out where the WAITs came from, and it’s not because the clock frequency was inherently too high. Rather, it is because the reset-init handler for the chip does the adapter_khz change. So on a two-chip chain, after the first chip runs its reset-init handler, the adapter goes up to 8000, then the second chip, which is not running on PLL yet, runs its reset-init handler with adapter already at 8000 and issues WAITs in there. On March 13, 2018 2:02:37 PM PDT, Tomas Vanek via OpenOCD-develwrote: >Did you use 'adapter_khz 8000' for the last test? >I'm afraid that 8 MHz is too much (WAITs during reset-init). -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On 13.03.2018 21:14, Christopher Head wrote: On March 13, 2018 12:21:47 PM PDT, Tomas Vanek via OpenOCD-develwrote: Obviously faster DAP WAIT handling on USB HS. The question remains: why are you getting DAP WAITs with algo, is the reason different adapter FT2232H vs FT2232C (should not be different except faster turnaround but...) or is it a difference between STM32F722 and F745 ? This is indeed an interesting question. I don’t have an F722 to test with. I also don’t have a 2232C. Damn I neglected that I had to decrease adapter_khz. I tried some experiments. First I tried mucking about with CM7_AHBSCR to increase the priority of AHBS accesses to the DTCM over CPU accesses; no change. Then I tried eliminating the AHBS altogether by putting the work area at 0x2001 (system SRAM); also no change. Finally I tried changing dap memaccess; this did help. When I changed from the default of 8 up to 44 (43 was not enough), I got no more WAITs and 150 kiB/s. Did you use 'adapter_khz 8000' for the last test? I'm afraid that 8 MHz is too much (WAITs during reset-init). I copied the value from STM32F4 config to be consistent. ST-Link limits the clock to 4 MHz so who knows if 8 MHz was really tested. Anyway F4 does not suffer from WAIT problem. Try to comment out "adapter_khz 8000" in -event reset-init definition. 2000 kHz should work with default memaccess 8 (if not please find the memaccess limit). So, what now? Is that setting something that belongs in the F7 target file? Especially if the value depends on internal flash timing in such broad range as 16usec typ 100 usec max ... -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On March 13, 2018 12:21:47 PM PDT, Tomas Vanek via OpenOCD-develwrote: >Obviously faster DAP WAIT handling on USB HS. >The question remains: why are you getting DAP WAITs with algo, is the >reason >different adapter FT2232H vs FT2232C (should not be different except >faster turnaround but...) or is it a difference between STM32F722 and >F745 ? This is indeed an interesting question. I don’t have an F722 to test with. I also don’t have a 2232C. I tried some experiments. First I tried mucking about with CM7_AHBSCR to increase the priority of AHBS accesses to the DTCM over CPU accesses; no change. Then I tried eliminating the AHBS altogether by putting the work area at 0x2001 (system SRAM); also no change. Finally I tried changing dap memaccess; this did help. When I changed from the default of 8 up to 44 (43 was not enough), I got no more WAITs and 150 kiB/s. So, what now? Is that setting something that belongs in the F7 target file? -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On 13.03.2018 19:04, Christopher Head wrote: OK, here are my test results. All are taken on a JTAG chain with two STM32F745 chips, using an Olimex ARM-USB-TINY-H. In all cases the data is written to the second chip in the chain and is 473 kilobytes using the command “flash write_bank 1 filename.bin”. In all cases the default clock is used, so 2000 on tests without your patch and 8000 with it. In all cases where the write was successful, I also did a verify and it passed. If I didn’t mention how many DAP WAITs I saw in a particular case, it means there were none. In no case did I muck with the DAP memaccess setting. === Commit b8c7232b === Reset init: no DAP WAITs. With algorithm: 2 DAP WAITs followed by debug regions are unpowered. Without algorithm: 2.900 kiB/s. === With only your patch from 4464 === Reset init: 8 DAP WAITs but it seems happy. With algorithm: 120 DAP WAITs, 53.283 kiB/s. Without algorithm: 4.135 kiB/s. === With only my patch from 4463 === Reset init: no DAP WAITs. With algorithm: 2 DAP WAITs followed by debug regions are unpowered. Without algorithm: 1 DAP WAIT, 35.239 kiB/s. === With both patches === Reset init: 8 DAP WAITs but it seems happy. With algorithm: 122 DAP WAITs, 52.752 kiB/s. Without algorithm: 1 DAP WAIT, 55.177 kiB/s. Obviously faster DAP WAIT handling on USB HS. The question remains: why are you getting DAP WAITs with algo, is the reason different adapter FT2232H vs FT2232C (should not be different except faster turnaround but...) or is it a difference between STM32F722 and F745 ? Tom -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
OK, here are my test results. All are taken on a JTAG chain with two STM32F745 chips, using an Olimex ARM-USB-TINY-H. In all cases the data is written to the second chip in the chain and is 473 kilobytes using the command “flash write_bank 1 filename.bin”. In all cases the default clock is used, so 2000 on tests without your patch and 8000 with it. In all cases where the write was successful, I also did a verify and it passed. If I didn’t mention how many DAP WAITs I saw in a particular case, it means there were none. In no case did I muck with the DAP memaccess setting. === Commit b8c7232b === Reset init: no DAP WAITs. With algorithm: 2 DAP WAITs followed by debug regions are unpowered. Without algorithm: 2.900 kiB/s. === With only your patch from 4464 === Reset init: 8 DAP WAITs but it seems happy. With algorithm: 120 DAP WAITs, 53.283 kiB/s. Without algorithm: 4.135 kiB/s. === With only my patch from 4463 === Reset init: no DAP WAITs. With algorithm: 2 DAP WAITs followed by debug regions are unpowered. Without algorithm: 1 DAP WAIT, 35.239 kiB/s. === With both patches === Reset init: 8 DAP WAITs but it seems happy. With algorithm: 122 DAP WAITs, 52.752 kiB/s. Without algorithm: 1 DAP WAIT, 55.177 kiB/s. -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On 13.03.2018 10:40, Uwe Bonnes wrote: Just for reference: Can anybody test what number gdb reports for the "load" command in that circumstances? Thanks Good point, Uwe. Here you go: FT2232, JTAG, STM32F722-nucleo: > reset init ... > adapter_khz adapter speed: 3000 kHz > dap memaccess memory bus access delay set to 8 tck > load_image 64kib.bin 0x2000 65536 bytes written at address 0x2000 downloaded 65536 bytes in 0.637129s (100.451 KiB/s) FT2232, SWD, STM32F722-nucleo, now with 'dap memaccess': > reset init ... > adapter_khz adapter speed: 3000 kHz > dap memaccess memory bus access delay set to 8 tck > load_image 64kib.bin 0x2000 65536 bytes written at address 0x2000 downloaded 65536 bytes in 0.552768s (115.781 KiB/s) > flash write_image 64kib.bin 0x0803 not enough working area available(requested 76) ... error writing to flash at address 0x0800 at offset 0x0003 > dap memaccess 9 memory bus access delay set to 9 tck > flash write_image 64kib.bin 0x0807 not enough working area available(requested 76) no working area available, can't do block memory writes couldn't use block writes, falling back to single memory accesses wrote 65536 bytes from file 64kib.bin in 0.629446s (101.677 KiB/s) -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Dienstag, 13. März 2018 01:32:24 CET Tomas Vanek via OpenOCD-devel wrote: > On 12.03.2018 21:53, Christopher Head wrote: > > On March 10, 2018 11:25:15 PM PST, Tomas Vanek via OpenOCD-devel wrote: > I got much worse results with DAP WAIT on a slow old Intel Atom single > core industrial PC: > speed as slow as 3.618 KiB/s (without wait 48.723 KiB/s). Just a general remark on WAIT with cheap USB shift register drivers: Even with a proper implementation of WAIT handling so that it is not treated as an error any more, the performance will be abysmal. Especially for large block writes, once you hit a WAIT condition, all following writes will be discarded, but since we do only check the result at the end of a block to save time on the fast path, a lot of discarded writes have to be replayed slowly and synchronously and that will take a lot of time. You will always have the best performance if you carefully tweak your write speed so that you reach optimum performance without ever producing a WAIT response. HLA adapters or CMSIS-DAP with SWD have a clear advantage here, because they can do a proper synchronous WAIT at the JTAG or SWD link layer and not up in the ADI protocol. BR, Matthias -- Mit freundlichen Grüßen/Best regards, Matthias Welwarsky Project Engineer SYSGO AG Office Mainz Am Pfaffenstein 14 / D-55270 Klein-Winternheim / Germany Phone: +49-6136-9948-0 / Fax: +49-6136-9948-10 VoIP: SIP:m...@sysgo.com E-mail: matthias.welwar...@sysgo.com / Web: http://www.sysgo.com _ Web: https://www.sysgo.com Blog: https://www.sysgo.com/blog Events: https://www.sysgo.com/events Newsletter: https://www.sysgo.com/newsletter _ Handelsregister/Commercial Registry: HRB Mainz 90 HRB 8066 Vorstand/Executive Board: Etienne Butery (CEO), Kai Sablotny (COO) Aufsichtsratsvorsitzender/Supervisory Board Chairman: Marc Darmon USt-Id-Nr./VAT-Id-No.: DE 149062328 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
> "Tomas" == Tomas Vanek via OpenOCD-devel >writes: ... >> reset init Tomas> ... >> adapter_khz Tomas> adapter speed: 3000 kHz >> dap memaccess Tomas> memory bus access delay set to 8 tck >> targets f7.cpu flash write_image 64kib.bin 0x0802 Tomas> wrote 65536 bytes from file 64kib.bin in 0.743568s (86.071 KiB/s) >> targets f4.cpu flash write_image 64kib.bin 0x0802 Tomas> wrote 65536 bytes from file 64kib.bin in 0.763147s (83.863 KiB/s) Just for reference: Can anybody test what number gdb reports for the "load" command in that circumstances? Thanks -- Uwe Bonnesb...@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt - Tel. 06151 1623569 --- Fax. 06151 1623305 - -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On 12.03.2018 21:53, Christopher Head wrote: On March 10, 2018 11:25:15 PM PST, Tomas Vanek via OpenOCD-develwrote: I wouldn't call this case as an obscure one. The reason could be insufficient device clock rate, not very high adapter_khz. Anyway all these cases could be solved by configuring the device properly. I don’t think this is related to device clock speed. You will get a WAIT if there is a bus stall, and Flash programming is self timed. So I think it depends on the 16 (typical) to 100 (maximum) microsecond word programming time. If you can deliver 35 bits (length of a DRSCAN) plus the TMS transitions within the word programming time, then you will get a WAIT reply. You're right that self-timed flash write takes most of required time. Bus transport should take just one or two bus cycles (the second for clock sync). It is quite different when using algo. One more concern: If programming by algo is usable on SWD only, JTAG users should set WORKAREASIZE to zero. But algos are used for verify, blank check and external memories as well. This may impose a big penalty... Yes, this is unfortunate. The verify algorithm works fine for me, but of course it is a synchronous, rather than asynchronous, algorithm, so any silicon erratum exposed by bus arbitration or other weirdness would not apply there. In any case, 4463 makes this change. I get one DAP WAIT, but no more, with my FTDI at 2M, and programming works fine and verifies properly Have you noticed programming speed? For testing I connected STM32F722-nucleo and STM32F413-nucleo to one JTAG chain from FT2232. I configured 128 MHz clock in F7 reset-init (http://openocd.zylin.com/4464) a lowered max adapter_khz to 3000 as my old FT2232C does not work well @ 6000 khz. With algo: > reset init ... > adapter_khz adapter speed: 3000 kHz > dap memaccess memory bus access delay set to 8 tck > targets f7.cpu > flash write_image 64kib.bin 0x0802 wrote 65536 bytes from file 64kib.bin in 0.743568s (*86*.071 KiB/s) > targets f4.cpu > flash write_image 64kib.bin 0x0802 wrote 65536 bytes from file 64kib.bin in 0.763147s (83.863 KiB/s) Now with your patch and WORKAREASIZE 0 (both devices performs very similar so I list just F722): > reset init ... > adapter_khz adapter speed: 3000 kHz > dap memaccess memory bus access delay set to 8 tck > flash write_image 64kib.bin 0x0804 device id = 0x10006452 flash size = 512kbytes not enough working area available(requested 76) no working area available, can't do block memory writes couldn't use block writes, falling back to single memory accesses DAP transaction stalled (WAIT) - slowing down wrote 65536 bytes from file 64kib.bin in 2.583704s (*24*.771 KiB/s) However if you set longer memory access delay manually: > dap memaccess 31 memory bus access delay set to 31 tck > flash write_image 64kib.bin 0x0807 not enough working area available(requested 76) no working area available, can't do block memory writes couldn't use block writes, falling back to single memory accesses wrote 65536 bytes from file 64kib.bin in 0.962217s (*66*.513 KiB/s) I got much worse results with DAP WAIT on a slow old Intel Atom single core industrial PC: speed as slow as 3.618 KiB/s (without wait 48.723 KiB/s). FT2232 SWD transport (F7 only): > reset init ... > adapter_khz adapter speed: 3000 kHz > flash write_image 64kib.bin 0x0806 device id = 0x10006452 flash size = 512kbytes not enough working area available(requested 76) no working area available, can't do block memory writes couldn't use block writes, falling back to single memory accesses SWD DPIDR 0x5ba02477 Failed to write memory at 0x0806006c error writing to flash at address 0x0800 at offset 0x0006 Too fast, DAP WAIT is an error on FTDI/SWD. > adapter_khz 1500 adapter speed: 1500 kHz > flash write_image 64kib.bin 0x0807 not enough working area available(requested 76) no working area available, can't do block memory writes couldn't use block writes, falling back to single memory accesses wrote 65536 bytes from file 64kib.bin in 0.734222s (87.167 KiB/s) Works. 46 SWCLK cycles / 1.5 MHz = 30.6 usec > t_flash And finally ST-Link with original adapter_khz values: > reset init ... adapter speed: 4000 kHz > flash write_image 64kib.bin 0x0803 not enough working area available(requested 76) no working area available, can't do block memory writes couldn't use block writes, falling back to single memory accesses wrote 65536 bytes from file 64kib.bin in 9.900800s (6.464 KiB/s) Really slow in comparison to 110.128 KiB/s with algo. Your change really speed-up non-algo flashing. Unfortunately WAIT handling on dumb adapters is far from effective and manual setting of extra memaccess cycles heavily depends on the flash timing and this may vary with the flash wear out/temperature/whatever. (at least, it does once I work around the fact that my nasty multi target hacks have gone from
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On March 10, 2018 11:25:15 PM PST, Tomas Vanek via OpenOCD-develwrote: >I wouldn't call this case as an obscure one. The reason could be >insufficient device clock rate, >not very high adapter_khz. Anyway all these cases could be solved by >configuring >the device properly. I don’t think this is related to device clock speed. You will get a WAIT if there is a bus stall, and Flash programming is self timed. So I think it depends on the 16 (typical) to 100 (maximum) microsecond word programming time. If you can deliver 35 bits (length of a DRSCAN) plus the TMS transitions within the word programming time, then you will get a WAIT reply. >One more concern: If programming by algo is usable on SWD only, JTAG >users should >set WORKAREASIZE to zero. But algos are used for verify, blank check >and >external memories as well. >This may impose a big penalty... Yes, this is unfortunate. The verify algorithm works fine for me, but of course it is a synchronous, rather than asynchronous, algorithm, so any silicon erratum exposed by bus arbitration or other weirdness would not apply there. In any case, 4463 makes this change. I get one DAP WAIT, but no more, with my FTDI at 2M, and programming works fine and verifies properly (at least, it does once I work around the fact that my nasty multi target hacks have gone from necessary to counterproductive). -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On March 1, 2018 2:55:37 AM PST, Tomas Vanek via OpenOCD-develwrote: >I meant possible errors induced from sticky state to target handling. >Maybe nonsense, ok. >Christopher, did you get algo timeouts and unpowered dbg regions on >FTDI Yes, I got those messages when using the original, unmodified algorithm on FTDI. I also got those messages when using ByteBlaster, but only after modifying the algorithm to remove the CR/SR accesses from the loop; with those in the loop, ByteBlaster gave me lots of WAITs but generally no errors. >OMG, we all forget about the important thing: JTAG clock to bus clock >ratio!!! >I run the example app on the F722nucleo (which sets up some faster >clock), halt and *without* reset init >test flashing - no WAITs this time! >And it explains why STM32F4 worked - reset init sets up 64 MHz clock >(unlike STM32F7 where the out-of-reset HSI 16 MHz clock is used). >Seems like in this particular case the rule "adapter_khz <= F_CPU/6" is > >not sufficient. >Not surprisingly if we want fast algo programming we also need >reasonably fast CPU clock Interesting! I had used F4 in the past and I think it didn’t print WAIT messages, whereas WAIT messages showed up for F7. I never reported them because I thought WAIT was just normal flow control. I tried, on the F7, switching to 64 MHz clock, as the F4 does, just now as an experiment, using the FTDI. I got a lot of WAITs, but it did program successfully. I only got 36 kilobytes per second, a poor comparison to the 135 I managed earlier in direct mode, but at least it worked. The other target in the multitarget chain doesn’t seem to be working so well, but I will leave further investigation there until the big event handler multitarget brokenness stuff is fixed (I intend to try out the pending patches but have not had time yet). By the way, where did the clock/6 come from? I don’t think I saw it in the ADI spec, the Cortex-M7 user guide or reference manual, or the F7 reference manual or datasheet. Just curious. Side note, I saw that the F4 config file sets the upper four bits of RCC_PLLCFGR to zero, but the reference manual says they should be kept at their reset value and that the reset value of the register is 0x24003010. Maybe it doesn’t matter, but what if ST put something important but not user-pokable there? -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On 01.03.2018 10:25, Matthias Welwarsky wrote: WAITs are very strange. It looks like the stalled access to flash blocks also JTAG access to RAM. And SWD access doesn't suffer this silicon bug... who knows... maybe some NOPs in algo busy wait loop would fix it. BTW The programming algo should avoid bus stalling, shouldn't it? I was wondering about this. It's not really all that wonderous, since it's the DAP that stalls, and the DAP is single-issue. So of course once you run into a WAIT condition, it will only clear when the DAP has completed the access that caused it, and to complete it, you have to, well, wait. Note that when you receive the WAIT, it is the _previous_ access that is still pending, not the one you received the WAIT for. But the write algo is designed to avoid WAIT condition. And why WAIT does not appear when SWD transport is used? To complicate things further: the same test on STM32F413-nucleo runs *without any problems*. What OpenOCD version do you use? It looks like your version misses Matthias' WAIT handling if you get such errors like algo timeout. I don't think algo-timeout has to do with the DAP stalling. Isn't algo-timeout just that the algorithm running on the core is not reporting back to the debugger? The debugger is waiting for the target to execute a breakpoint so that it gets back control, right? I meant possible errors induced from sticky state to target handling. Maybe nonsense, ok. Christopher, did you get algo timeouts and unpowered dbg regions on FTDI? OMG, we all forget about the important thing: JTAG clock to bus clock ratio!!! I run the example app on the F722nucleo (which sets up some faster clock), halt and *without* reset init test flashing - no WAITs this time! And it explains why STM32F4 worked - reset init sets up 64 MHz clock (unlike STM32F7 where the out-of-reset HSI 16 MHz clock is used). Seems like in this particular case the rule "adapter_khz <= F_CPU/6" is not sufficient. Not surprisingly if we want fast algo programming we also need reasonably fast CPU clock. Tom -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Donnerstag, 1. März 2018 07:27:49 CET Christopher Head wrote: > On Thu, 1 Mar 2018 00:12:12 +0100 > > Tomas Vanekwrote: > > We should also focus to a question why algo flashing is broken on > > FTDI. Some non STM devices (e.g. Kinetis) work with very similar algo > > just perfectly on FTDI or any other adapter. > > Sure. If someone could fix algorithm-based flashing, I would love to > use it! I’m not convinced it will make things any faster for my > specific set of hardware, but as long as it’s not broken and not > slower, I don’t really care, and I understand it does make things > faster for some people. > > > WAITs are very strange. It looks like the stalled access to flash > > blocks also JTAG access to RAM. > > And SWD access doesn't suffer this silicon bug... who knows... maybe > > some NOPs in algo busy wait loop would fix it. > > BTW The programming algo should avoid bus stalling, shouldn't it? > > I was wondering about this. It's not really all that wonderous, since it's the DAP that stalls, and the DAP is single-issue. So of course once you run into a WAIT condition, it will only clear when the DAP has completed the access that caused it, and to complete it, you have to, well, wait. Note that when you receive the WAIT, it is the _previous_ access that is still pending, not the one you received the WAIT for. > The second data point is this: when using the algorithm-based approach, > I attached an oscilloscope to TDO coming out of the F7. I was very > surprised to see it *tristate* from time to time (at least, I’m pretty > sure it tristated—it had a very slow rise time and settled to a voltage > somewhat below VDD). I didn’t manage to correlate the time of the > tristate to any particular higher level activity, but it definitely > happened quite frequently during a programming operation and looked > very weird. I’m pretty sure it didn’t happen during direct programming, > only algorithm-driven programming. I found this suspicious, but again, > didn’t look into it too much as the direct approach was very fast. TDO just get's tristated when the target is not driving it, i.e. if you're not in SHIFT-IR/DR state. I have the same behaviour on the i.MX8MQ I'm currently working with. > > What OpenOCD version do you use? It looks like your version misses > > Matthias' WAIT handling > > if you get such errors like algo timeout. I don't think algo-timeout has to do with the DAP stalling. Isn't algo-timeout just that the algorithm running on the core is not reporting back to the debugger? The debugger is waiting for the target to execute a breakpoint so that it gets back control, right? > It was head of master from somewhere in the last week or two. I can > look up the exact commit ID tomorrow if you want. WAIT is included in 0.10.0 BR, Matthias -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Thu, 1 Mar 2018 00:12:12 +0100 Tomas Vanekwrote: > We should also focus to a question why algo flashing is broken on > FTDI. Some non STM devices (e.g. Kinetis) work with very similar algo > just perfectly on FTDI or any other adapter. Sure. If someone could fix algorithm-based flashing, I would love to use it! I’m not convinced it will make things any faster for my specific set of hardware, but as long as it’s not broken and not slower, I don’t really care, and I understand it does make things faster for some people. > WAITs are very strange. It looks like the stalled access to flash > blocks also JTAG access to RAM. > And SWD access doesn't suffer this silicon bug... who knows... maybe > some NOPs in algo busy wait loop would fix it. > BTW The programming algo should avoid bus stalling, shouldn't it? I was wondering about this. I have two weird data points I can add to this discussion. The first data point is this: remember early on in the thread where I said I wasn’t able to successfully modify the algorithm to move the CR write and SR read out of the loop? When I tried, it ran for a while and then gave either debug regions unpowered or, more commonly, timeout waiting for algorithm—the same messages I got using the FTDI adapter with the original, unmodified algorithm, only with the modified algorithm, it gave those messages using the ByteBlaster as well, which had formerly been very robust. This seemed suspicious, and I’m reasonably certain I got the modifications to the algorithm correct (I’ve done plenty of Thumb assembly in the past), but I didn’t pay too much attention as the direct approach was so fast. The second data point is this: when using the algorithm-based approach, I attached an oscilloscope to TDO coming out of the F7. I was very surprised to see it *tristate* from time to time (at least, I’m pretty sure it tristated—it had a very slow rise time and settled to a voltage somewhat below VDD). I didn’t manage to correlate the time of the tristate to any particular higher level activity, but it definitely happened quite frequently during a programming operation and looked very weird. I’m pretty sure it didn’t happen during direct programming, only algorithm-driven programming. I found this suspicious, but again, didn’t look into it too much as the direct approach was very fast. The reference manual seems a little unclear on whether the algorithm as written should stall the CPU or not. It says, “Any attempt to read the Flash memory while it is being written or erased, causes the bus to stall. Read operations are processed correctly once the program operation has completed. This means that code or data fetches cannot be performed while a write/erase operation is ongoing.” The obvious way to interpret that sentence is that the STRH places the halfword into the CPU write buffer, the DSB pushes it out as an AXI write cycle to the Flash interface, the Flash interface immediately completely the bus cycle and internally buffers the data while starting the burn, and then the AXI is free to proceed to SR polling, while the *next* AXI cycle, if any, accessing the Flash interface while still busy will be stalled. However, an alternative way to interpret that is that the AXI write cycle that delivers the halfword is stalled by the Flash interface, but the CPU can continue execution because the data is in the CPU write buffer, and the CPU can proceed before the bus cycle completes. In this case the DSB would stall the CPU. BSY seems rather pointless in that case, but I have learned not to assume anything when reading silicon documentation (and it could be useful to avoid stalls if used without DSB, I suppose). My guess would be the first interpretation is correct, though, which means the algorithm as written should indeed not stall the CPU ever, since code execution is from DTCM which is unrelated to Flash. It should be possible to test which interpretation is correct by performing a STRH followed by a DSB and then checking whether BSY is set immediately afterwards; if yes, then interpretation 1 is correct, while if no, then interpretation 2 is correct. Assuming interpretation 1 is correct, though, I don’t see anything wrong with the algorithm code. > What OpenOCD version do you use? It looks like your version misses > Matthias' WAIT handling > if you get such errors like algo timeout. It was head of master from somewhere in the last week or two. I can look up the exact commit ID tomorrow if you want. -- Christopher Head pgpk4rsNNRS1R.pgp Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Mittwoch, 28. Februar 2018 21:51:29 CET Christopher Head wrote: > I’ll be completely honest here: the reason I tried doing this is because the > algorithm approach *broke* with the FTDI adapter, not because I wanted to > improve speed. It kept issuing messages either timeout waiting for > algorithm or debug regions unpowered. So I tried bypassing the algorithm > and noticed that it was really slow, *then* tried speeding it up by moving > the CR and SR accesses out of the loop and noticed that it became really > really fast. Not surprising, since the turn-around penalty over USB is quite substantial. This is the main reason why openocd uses all the "STICKY" features, which is OK in principle for throughput but makes error recovery really cumbersome. And the throughput can be quite substantial. I've timed uploads to DDR memory at 30 MHz JTAG clock to reach close to 1MB/s, as long as you just push data out and don't need to flush the queue to turn around and read back status registers. BR, Matthias -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On February 28, 2018 8:10:09 AM PST, Andreas Bolschwrote: >To be sure I just did the following tests (openOCD, current head, >integrated ST-Link v2-1, 4 MHz SWD clock): >nucleo-f767zi, 2 MByte random data: prog: 140 kBytes/s, read: 150 >kBytes/s >disco-f412g, 1 MByte random data: prog: 134 kBytes/s, read: 158 >kBytes/s > >Then STM32CubeProgrammer (defaults, Linux host, integrated ST-Link >v2-1): >disco-f412g, 1 MByte random data: prog. 133 kBytes/s, read: 150 >kBytes/s > >And finally openOCD with algorithm disabled, anything else as before: >disco-f412g, 1 MByte random data: prog. 1 kByte/s (yes, no kidding, >ONE!) > >All tests above with SWD, not JTAG. They were also done with an HLA. Not all of us have the option to use such an adapter, for various reasons, nor do all of us have the option of using SWD over JTAG. >That the direct register approach is quite slow isn't surprising. >That's >like playing ping-pong over USB for every single bit. Word, actually. Not bit. At least for the ByteBlaster and FTDI adapters. And with the CR and SR accesses pulled out of the loop, it turns into a single giant call to target_write_memory to write the entire image, which AFAIK just shovels words (or halfwords, at 16× parallelism) into the DRW in TAR autoincrement mode. Anyway, your test results with algorithm on an HLA seem to give roughly the same performance (135 kilobytes per second if you don’t include erase time—did you?) as I get without algorithm on an FTDI. > The main benefit >of the algorithm approach is that data transport and programming >("real" programming with CPU stall) run simultaneously. Of course, this > >can only work smoothly if the programming adapter does support this >"streaming" approach, so it won't work reasonably well with a low-level > >adapter. I’ll be completely honest here: the reason I tried doing this is because the algorithm approach *broke* with the FTDI adapter, not because I wanted to improve speed. It kept issuing messages either timeout waiting for algorithm or debug regions unpowered. So I tried bypassing the algorithm and noticed that it was really slow, *then* tried speeding it up by moving the CR and SR accesses out of the loop and noticed that it became really really fast. So while the algorithm approach seems really nice conceptually, in practice, for me, it doesn’t work, so I took the shortest path to something that *would* work, then discovered it could be fast anyway. >Regarding the parallelism I'd suggest to leave the parallelism by >default as it currently is, i. e. 16. >Anything else would be a pitfall for the unaware user. The assumption >that most users will use 2.4V to 3.3V supply is still valid, I guess. >If >it were configurable, 32 wouldn't give substancially higher speed >(well, >at least if a "good" programming adapter is used) anyway. Fair enough. I never wanted to change the default anyway. I just wanted to provide the user with the ability to change it should they wish. Does this seem reasonable to you? -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
These figures are quite surprising. I've made a lot of benchmarks with a pile of discovery boards, mainly F4 and F7, some L4. Since my focus was on external spi flash, I did not record the results for the internal flash, but as far as I recall, both programming *AND* read (via write_bank , read_bank) with the algorithm aproach gave approx. 120-160 kBytes/s. To be sure I just did the following tests (openOCD, current head, integrated ST-Link v2-1, 4 MHz SWD clock): nucleo-f767zi, 2 MByte random data: prog: 140 kBytes/s, read: 150 kBytes/s disco-f412g, 1 MByte random data: prog: 134 kBytes/s, read: 158 kBytes/s Then STM32CubeProgrammer (defaults, Linux host, integrated ST-Link v2-1): disco-f412g, 1 MByte random data: prog. 133 kBytes/s, read: 150 kBytes/s And finally openOCD with algorithm disabled, anything else as before: disco-f412g, 1 MByte random data: prog. 1 kByte/s (yes, no kidding, ONE!) All tests above with SWD, not JTAG. Some weeks ago I did tests with ST-Link reflashed to JLink and JLink v8, JLink v9, but the results were rather disappointing. Some quite the same speed, some slightly to moderately slower when compared to ST-Link. And some tests with ST-Link v2 clones, they gave roughly the same speed as the integrated ST-Link. The surprising fact is that I got the very same limit (almost precisely 150 kBytes/s) for external spi flash programming and reading (both with the QSPI interface and bitbanging SPI). This apparently indicates that the "real" programming time has almost no impact on the observed speed. What matters is the transport via USB, the ST-Link adapter and SWD clock. The datasheet for f767 says typ. 16 us per programming operation, so 8 us per byte for parallelism 16 or 125 kBytes/s. I. e. openOCD already operates at the hardware imposed limit, and the programming time is almost completely absorbed by the data transfer. Quite excellent, I'd say. That the direct register approach is quite slow isn't surprising. That's like playing ping-pong over USB for every single bit. The main benefit of the algorithm approach is that data transport and programming ("real" programming with CPU stall) run simultaneously. Of course, this can only work smoothly if the programming adapter does support this "streaming" approach, so it won't work reasonably well with a low-level adapter. Regarding the parallelism I'd suggest to leave the parallelism by default as it currently is, i. e. 16. Anything else would be a pitfall for the unaware user. The assumption that most users will use 2.4V to 3.3V supply is still valid, I guess. If it were configurable, 32 wouldn't give substancially higher speed (well, at least if a "good" programming adapter is used) anyway. BTW: "parallelism" apparently means ***maximum*** parallelism, cf. rm0081, 1.5.2: "Parallelism is the maximum number of bits that may be programmed to 0 in one step during a program or erase operation. The maximum program/erase parallelism is limited by the supply voltage and by whether the external V PP supply is used or not. ..." Hence "limited by" actually means "limited ***above*** by", and the table indicates the maximum allowed value, not the exact value to use. On 2018-02-27 21:50, Christopher Head wrote: As for performance, I have two data points so far. First, using a ByteBlaster clone, I was able to achieve about 6 kilobytes per second using the algorithm and about 10 using optimized direct programming (the original direct code got about 3). Second, using an Olimex ARM-USB-TINY-H (FTDI-based), I had to reduce the JTAG clock *massively* in order to get the algorithm approach to even work at all (otherwise it would see a mix of timeout waiting for algorithm and debug regions unpowered), but optimized direct programming at the default 2 MHz JTAG clock got me 30 kilobytes per second, much more than the algorithm approach at the reduced clock speed. Both of the above tests were made at 16× parallelism. Repeating the Olimex test with the optimized direct code at 32× parallelism yielded 84 kilobytes per second. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
> "Freddie" == Freddie Chopinwrites: ... Freddie> Reference Manual is misleading here, but datasheet is even more Freddie> confusing. If you look at any STM32F7 datasheet, it says that Freddie> max Programming voltage Vprog for 16x and 8x parallelism is 3.6 Freddie> V, but for 32x parallelism it is 3 V. I suspect that this is a Freddie> typo and in fact for all parallelism values max programming Freddie> voltage is the same - 3.6 V. Freddie> 5.3.13 Memory characteristics Table 48. Flash memory Freddie> programming (numbers don't have to match your version of Freddie> datasheet exactly) I have brought up that problem on th ST forum. Lets see if some ST comments come are reaction. Bye -- Uwe Bonnesb...@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt - Tel. 06151 1623569 --- Fax. 06151 1623305 - -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Dienstag, 27. Februar 2018 23:23:47 CET Christopher Head wrote: > On February 27, 2018 2:42:25 AM PST, Matthias Welwarskywrote: > >I'm guessing that the BUSY check was done to explicitly to avoid a JTAG > >WAIT, > >which was an error condition not long ago. It might still break with > >SWD. > > Ah, I didn’t know that WAIT was ever considered an error. From reading the > ARM debug infrastructure spec, it looked more like a flow control > mechanism. I see now that OpenOCD appears to enable ORUNDETECT (at least I > think it does, based on dap_dp_init in arm_adi_v5.c). Before 0.10.0, WAIT was indeed an error condition. I wrote the support for WAIT for JTAG transport, it's a bit tricky since for performance reasons, we have to use deep queues and ORUNDETECT which considerably changes the DAP behaviour due to its 'stickyness' and requires a complex machinery to detect and replay the transactions following and including the one causing the WAIT. With an active probe that has knowledge of the DAP protocol it's dead simple, but for USB-connected shift registers - not so. > According to the ADI specification, SWD also has a WAIT response, which it > issues in case a previous transaction is outstanding. It says just the same > as JTAG: if WAIT is received, normally a debugger just resends the same > transaction. Although using sticky overrun mode changes the format a bit so > that WAIT is followed by a data packet, which it would not be with > ORUNDETECT cleared, and you have to clear the sticky status bit. I never got the opportunity to extend the WAIT code to also cover SWD. I simply don't have a platform using SWD transport that is not using CMSIS-DAP or another high-level adapter. But if you have the motivation - just go ahead, I'll be happy to assist. BR, Matthias -- Mit freundlichen Grüßen/Best regards, Matthias Welwarsky Project Engineer SYSGO AG Office Mainz Am Pfaffenstein 14 / D-55270 Klein-Winternheim / Germany Phone: +49-6136-9948-0 / Fax: +49-6136-9948-10 VoIP: SIP:m...@sysgo.com E-mail: matthias.welwar...@sysgo.com / Web: http://www.sysgo.com _ Web: https://www.sysgo.com Blog: https://www.sysgo.com/blog Events: https://www.sysgo.com/events Newsletter: https://www.sysgo.com/newsletter _ Handelsregister/Commercial Registry: HRB Mainz 90 HRB 8066 Vorstand/Executive Board: Etienne Butery (CEO), Kai Sablotny (COO) Aufsichtsratsvorsitzender/Supervisory Board Chairman: Marc Darmon USt-Id-Nr./VAT-Id-No.: DE 149062328 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On February 27, 2018 2:42:25 AM PST, Matthias Welwarskywrote: >I'm guessing that the BUSY check was done to explicitly to avoid a JTAG >WAIT, >which was an error condition not long ago. It might still break with >SWD. Ah, I didn’t know that WAIT was ever considered an error. From reading the ARM debug infrastructure spec, it looked more like a flow control mechanism. I see now that OpenOCD appears to enable ORUNDETECT (at least I think it does, based on dap_dp_init in arm_adi_v5.c). According to the ADI specification, SWD also has a WAIT response, which it issues in case a previous transaction is outstanding. It says just the same as JTAG: if WAIT is received, normally a debugger just resends the same transaction. Although using sticky overrun mode changes the format a bit so that WAIT is followed by a data packet, which it would not be with ORUNDETECT cleared, and you have to clear the sticky status bit. -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On February 27, 2018 1:10:19 AM PST, Antonio Borneowrote: >Regarding your proposal to get rid of the write algorithm, I'm a >little sceptical it is not needed. >I would like to see the optimized direct programming code and get some >performance measurement before going for it. Sure. Do you want to see this on Zylin, or somewhere else? As for performance, I have two data points so far. First, using a ByteBlaster clone, I was able to achieve about 6 kilobytes per second using the algorithm and about 10 using optimized direct programming (the original direct code got about 3). Second, using an Olimex ARM-USB-TINY-H (FTDI-based), I had to reduce the JTAG clock *massively* in order to get the algorithm approach to even work at all (otherwise it would see a mix of timeout waiting for algorithm and debug regions unpowered), but optimized direct programming at the default 2 MHz JTAG clock got me 30 kilobytes per second, much more than the algorithm approach at the reduced clock speed. Both of the above tests were made at 16× parallelism. Repeating the Olimex test with the optimized direct code at 32× parallelism yielded 84 kilobytes per second. All of the above numbers came from program commands, which means they are lower than the raw programming speed because they include the time take for erasing in the total time and throughout numbers. In all cases I used program with the verify option to make sure the data was correct. My sample was about half a meg of data. -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On February 27, 2018 1:28:01 AM PST, Freddie Chopinwrote: >5.3.13 Memory characteristics >Table 48. Flash memory programming >(numbers don't have to match your version of datasheet exactly) Oh, that is very interesting. I missed that table the first time. When I looked at the reference manual, the table in there makes it look like it could be unsafe to use too small a parallelism setting (e.g. 16× at 3.3V might damage the Flash), but the datasheet suggests it’s fine. And yes, it seems they contradict in that the RM says 32×@3.3 is optimal while the DS says it is prohibited. Regardless, I think we should just let the board file choose. Any objections to using the bus width number for this purpose? I was thinking we could use the chip width parameter in future to support STM32H7, where writes have to be 128 bits wide in order to prevent ECC errors—we could set chip width to 1 for F2/F4/F7 and 16 for H7, and the Flash code could recognize that difference and act accordingly; meanwhile bus width could be 1, 2, 4, or 8 to set the parallelism. -- Christopher Head signature.asc Description: PGP signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Tue, Feb 27, 2018 at 10:28 AM, Freddie Chopinwrote: > On Mon, 2018-02-26 at 16:23 -0800, Christopher Head wrote: >> Hi all, >> I was looking into an issue with Flash programming on the STM32F7. I >> discovered some quite odd results. >> >> First, I discovered that OpenOCD always uses 16-bit parallelism. >> There is a comment at the top of stm32f2x.c stating that this was >> chosen for compatibility with the widest possible range of VDD >> values, but I simply can’t see how this is true. The >> STM32F205/215/207/217 Flash programming manual PM0059 rev 5, the >> STM32F405/415/407/417/427/437/429/439 reference manual RM0090 rev 15, >> and the STM32F75/74 reference manual RM0385 rev 6 all contain exactly >> the same table, which says 64× shall be used with external VPP, 32× >> shall be used with VDD in [2.7,3.6], 16× shall be used with VDD in >> [2.1,2.7], and 8× shall be used with VDD in [1.8,2.1]. I imagine an >> awful lot of STM32s are probably operated at 3.3 volts, and that is >> *not* in the legal VDD range for 16× parallelism. Am I >> misunderstanding something here? > > Reference Manual is misleading here, but datasheet is even more > confusing. If you look at any STM32F7 datasheet, it says that max > Programming voltage Vprog for 16x and 8x parallelism is 3.6 V, but for > 32x parallelism it is 3 V. I suspect that this is a typo and in fact > for all parallelism values max programming voltage is the same - 3.6 V. > > 5.3.13 Memory characteristics > Table 48. Flash memory programming > (numbers don't have to match your version of datasheet exactly) Agree, quite confusing. A reason more to not use the voltage value in scripts for the selections and to stick at the programming width. By the way, flash erase algorithm is impacted by voltage<=>width too! Antonio -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Dienstag, 27. Februar 2018 01:23:08 CET Christopher Head wrote: > Second, I discovered that, in both algorithm-driven mode and direct > programming mode, the loop writes to CR, then writes one halfword of data > to the target address, then checks BSY and the error flags in SR. However, > this seems unnecessary. CR doesn’t magically change on its own; PG and > PSIZE can be set once and then many writes performed in a block, increasing > efficiency. Also, it is not necessary to check BSY after each write. Step 3 > of the Flash programming sequence is to “perform the data write > operation(s)”, which can be plural. If you manage to deliver data too fast, > the Flash hardware stalls the AHB or AXI bus cycles doing the subsequent > writes, which eventually translates into a WAIT JTAG response (in the > direct programming case) or a CPU execution stall (in the algorithm-driven > case), which is a reasonable flow control mechanism. The error bits in SR > are also cumulative. Taken together, all this means that one can simply > write CR once, write all the data, and then check SR afterwards, waiting > for the last write to finish and examining the error flags. Once modifying > the code to do this, I then discovered that direct-mode programming with > these changes is actually faster than algorithm-based programming without > them (I was not able to successfully modify the algorithm to omit these > extra operations, but I can’t see it making a whole lot of difference to > the execution time in algorithm mode. I'm guessing that the BUSY check was done to explicitly to avoid a JTAG WAIT, which was an error condition not long ago. It might still break with SWD. -- Mit freundlichen Grüßen/Best regards, Matthias Welwarsky Project Engineer SYSGO AG Office Mainz Am Pfaffenstein 14 / D-55270 Klein-Winternheim / Germany Phone: +49-6136-9948-0 / Fax: +49-6136-9948-10 VoIP: SIP:m...@sysgo.com E-mail: matthias.welwar...@sysgo.com / Web: http://www.sysgo.com _ Web: https://www.sysgo.com Blog: https://www.sysgo.com/blog Events: https://www.sysgo.com/events Newsletter: https://www.sysgo.com/newsletter _ Handelsregister/Commercial Registry: HRB Mainz 90 HRB 8066 Vorstand/Executive Board: Etienne Butery (CEO), Kai Sablotny (COO) Aufsichtsratsvorsitzender/Supervisory Board Chairman: Marc Darmon USt-Id-Nr./VAT-Id-No.: DE 149062328 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Mon, 2018-02-26 at 16:23 -0800, Christopher Head wrote: > Hi all, > I was looking into an issue with Flash programming on the STM32F7. I > discovered some quite odd results. > > First, I discovered that OpenOCD always uses 16-bit parallelism. > There is a comment at the top of stm32f2x.c stating that this was > chosen for compatibility with the widest possible range of VDD > values, but I simply can’t see how this is true. The > STM32F205/215/207/217 Flash programming manual PM0059 rev 5, the > STM32F405/415/407/417/427/437/429/439 reference manual RM0090 rev 15, > and the STM32F75/74 reference manual RM0385 rev 6 all contain exactly > the same table, which says 64× shall be used with external VPP, 32× > shall be used with VDD in [2.7,3.6], 16× shall be used with VDD in > [2.1,2.7], and 8× shall be used with VDD in [1.8,2.1]. I imagine an > awful lot of STM32s are probably operated at 3.3 volts, and that is > *not* in the legal VDD range for 16× parallelism. Am I > misunderstanding something here? Reference Manual is misleading here, but datasheet is even more confusing. If you look at any STM32F7 datasheet, it says that max Programming voltage Vprog for 16x and 8x parallelism is 3.6 V, but for 32x parallelism it is 3 V. I suspect that this is a typo and in fact for all parallelism values max programming voltage is the same - 3.6 V. 5.3.13 Memory characteristics Table 48. Flash memory programming (numbers don't have to match your version of datasheet exactly) Regards, FCh -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Tue, Feb 27, 2018 at 7:03 AM, Christopher Headwrote: > On Mon, 26 Feb 2018 16:23:08 -0800 > Christopher Head wrote: > >> 2. Allow the user to set the parallelism level with a new stm32f2x >> subcommand, since only the board config knows what VDD is being >> supplied. Hi Christophe, I had something similar in my TODO list, but never got time to code it down. I agree that 16 bit parallelism is NOT the safest case. It is supposed to fail writing in flash if the device is powered at 1.8V The right setup should use 8 bit as default, compatible with all the possible voltage ranges. But this would increase the programming time in 3.3V systems. The first step in fixing this is to have write width selectable as 8, 16, 32 or 64 bits. I prefer let the user select the width (not the voltage) because the relationship width<=>voltage could be different in future devices. Actually some JTAG dongle is able to read the voltage of the target, but this feature is not always present and, depending on the setup, the wire that senses the voltage could be unconnected. I would not rely on the JTAG to know the target voltage. > Having thought it over a little more, perhaps we could use the bus > width parameter to the flash bank command for this purpose instead of > a new stm32f2x subcommand? Then add a settable variable which gets > passed through the shipped target config files? User should set the width in the board file (depending on board voltage) using a variable that is then passed to the target file. In the target file the variable should be checked and get a reasonable default if it is not set. Should we keep 16 bits for backward compatibility or 8 bits for safety reason? Then, how to pass the value: sub-command or "flash bank" parameter or even "target" parameter? I personally prefer adding it to "flash bank". We could easily handle the case of banks that require different value (I do not see this case today). But I would not reject other proposals, if well motivated. Regarding your proposal to get rid of the write algorithm, I'm a little sceptical it is not needed. I would like to see the optimized direct programming code and get some performance measurement before going for it. Best Regards, Antonio > -- > Christopher Head -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel
Re: [OpenOCD-devel] STM32F2/4/7 Flash programming
On Mon, 26 Feb 2018 16:23:08 -0800 Christopher Headwrote: > 2. Allow the user to set the parallelism level with a new stm32f2x > subcommand, since only the board config knows what VDD is being > supplied. Having thought it over a little more, perhaps we could use the bus width parameter to the flash bank command for this purpose instead of a new stm32f2x subcommand? Then add a settable variable which gets passed through the shipped target config files? -- Christopher Head pgpdtbXtc8qjq.pgp Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel