On Tue, Jan 24, 2023 at 9:46 PM Lanchon <[email protected]> wrote: > > > > On 1/24/23 17:35, Lanchon wrote: > > > > > > On 1/24/23 13:25, Koen Vandeputte wrote: > >> On Tue, Jan 24, 2023 at 4:26 PM Koen Vandeputte > >> <[email protected]> wrote: > >>> On Tue, Jan 24, 2023 at 7:59 AM Lanchon <[email protected]> wrote: > >>>> hi Koen, thanks again. > >>>> > >>>> i copied your log here for ease of reference: > >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db > >>>> > >>>> > >>>> first let me say: > >>>> > >>>> - ubinized sysupgrade is not used by any of my devices. > >>>> > >>>> - ubinized sysupgrade happens when when an ubi partition image is > >>>> fed as > >>>> an upgrade file. the image contains the complete set of ubi volumes > >>>> that > >>>> are normally stored within the ubi partition on your device: typically > >>>> kernel (raw image), R/O rootfs (sqfs), and R/W overlay (ubifs). during > >>>> said sysupgrade, the current configuration is first copied to RAM, > >>>> then > >>>> the ubi partition image is written, and finally -if current config is > >>>> kept- the RAM contents are written back to the new overlay volume. > >>>> > >>>> - ubinized sysupgrades were known to be broken by other people at the > >>>> time of my commits, and they wanted to remove the feature altogether > >>>> because it was unused (look it up in the relevant pull request for my > >>>> commits on github). as i remember it, it was broken because some ubi > >>>> volumes within the ubi partition were sometimes mounted or R/O block > >>>> devices were created on top of them (/dev/ubiblock*), locking the ubi > >>>> partition and preventing the upgrade. > >>>> > >>>> - although my devices would normally not use such upgrades, i could > >>>> still take a whole ubi partition backup and then test ubinized > >>>> sysupgrade with it on my devices. in fact, if you restore the ubi > >>>> partition image without conserving the current configuration, then > >>>> this > >>>> procedure is the best way to do a backup/restore of the complete state > >>>> of the router: kernel, rootfs, and overlay are completely saved and > >>>> restored. btw, i think this should be documented. (and this is mostly > >>>> the reason why i added gzip support on sysupgrade: ubinized images of > >>>> backed-up ubi partitions compress like crazy.) > >>>> > >>>> - my tests of ubinized sysupgrade worked after these changes but not > >>>> before. specifically the fix is in: af34733593 base-files: fix > >>>> ubinized > >>>> nand sysupgrade > >>>> > >>>> > >>>> regarding your log: > >>>> > >>>> - nand_do_platform_check succeeds: > >>>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L438-L469 > >>>> > >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L192 > >>>> > >>>> > >>>> - next comes the actual nand_do_flash_file: > >>>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L379-L405 > >>>> > >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2061 > >>>> > >>>> > >>>> - it is determined that passed file is a ubi partition image, so > >>>> nand_upgrade_ubinized is invoked: > >>>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L260-L269 > >>>> > >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2088 > >>>> > >>>> > >>>> nand_upgrade_ubinized is basically a one-liner: > >>>> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - && > >>>> ubiattach > >>>> -m "$mtdnum" > >>>> > >>>> cat/zcat the image, feeding that to ubiformat -f - which writes it. > >>>> > >>>> > >>>> and the write does take place, but take a look: > >>>> > >>>> ------------------- > >>>> > >>>> + cat /tmp/nandnew.ubi > >>>> + ubiformat /dev/mtd2 -y -f - > >>>> ubiformat: mtd2 (nand), size 250609664 bytes (239.0 MiB), 1912 > >>>> eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes > >>>> > >>>> libscan: scanning eraseblock 0 -- 0 % complete > >>>> libscan: scanning eraseblock 1 -- 0 % complete > >>>> libscan: scanning eraseblock 2 -- 0 % complete > >>>> ... > >>>> libscan: scanning eraseblock 1868 -- 97 % complete > >>>> > >>>> libscan: scanning eraseblock > >>>> [ 207.876200] ci_hdrc ci_hdrc.1: remove, state 1 > >>>> 1869 -- 97 % complete > >>>> > >>>> libscan: > >>>> [ 207.883339] usb usb2: USB disconnect, device number 1 > >>>> scanning eraseblock 1870 -- 97 % > >>>> [ 207.891238] usb 2-1: USB disconnect, device number 2 > >>>> complete > >>>> > >>>> libscan: scanning eras > >>>> [ 207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered > >>>> eblock 1871 -- 97 % complete > >>>> > >>>> libscan: scanning eraseblock 1872 > >>>> [ 207.910396] ci_hdrc ci_hdrc.0: remove, state 4 > >>>> -- 97 % complete > >>>> > >>>> libscan: scan > >>>> [ 207.917055] usb usb1: USB disconnect, device number 1 > >>>> ning eraseblock 1873 -- 98 % comp > >>>> [ 207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered > >>>> lete > >>>> > >>>> libscan: scanning eraseblo > >>>> [ 207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect > >>>> reboot! > >>>> ck 1874 -- 98 % complete > >>>> > >>>> libsca > >>>> [ 207.942382] reboot: Restarting system > >>>> > >>>> ----------------------- > >>>> > >>>> > >>>> while sysupgrade is flashing UBI the partition, the system is > >>>> rebooted; > >>>> i don't know why. > >>>> > >>>> here are the cleaned-up kernel messages: > >>>> > >>>> [ 207.876200] ci_hdrc ci_hdrc.1: remove, state 1 > >>>> [ 207.883339] usb usb2: USB disconnect, device number 1 > >>>> [ 207.891238] usb 2-1: USB disconnect, device number 2 > >>>> [ 207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered > >>>> [ 207.910396] ci_hdrc ci_hdrc.0: remove, state 4 > >>>> [ 207.917055] usb usb1: USB disconnect, device number 1 > >>>> [ 207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered > >>>> [ 207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect > >>>> reboot! > >>>> [ 207.942382] reboot: Restarting system > >>>> > >>>> > >>>> next the reboot takes place. u-boot mounts what it knows as mtd3 as an > >>>> ubi partition: > >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2183 > >>>> > >>>> > >>>> but openwrt used mtd2 to write to ubi, not mtd3. i don't know if in > >>>> your > >>>> device what openwrt calls mtd2 is called mtd3 by your current > >>>> (default) > >>>> u-boot config. > >>>> > >>>> > >>>> so some takeaways... > >>>> > >>>> - something is rebooting the system while i write the sysupgrade > >>>> image. > >>>> maybe another thread? maybe an unattended watchdog? > >>>> > >>>> - the reboot happens while 98% of the image is written. maybe this > >>>> issue > >>>> is time dependent and only shows up with my code because it is a > >>>> little > >>>> slower than the previous code. maybe the previous code reached 100% > >>>> before being rebooted and thus the upgrade went through. > >>>> > >>>> - the image *IS* written, albeit partially, which means that the > >>>> previous image that was there is definitely erased. what is booted > >>>> then? > >>>> IDK, it depends on the details of your device. maybe it is booting a > >>>> recovery image? or maybe the ubi partition format is not finished, but > >>>> the ubi volumes within are fully written before the reboot, so the > >>>> system doesn't brick by chance. (but then the newer image would be > >>>> booted, but you say that the sysupgrade has no effect and the prior > >>>> image is booted instead, so that can't be it.) maybe two ubi > >>>> partitions > >>>> are used on your device to implement an A/B dual system boot. so > >>>> maybe a > >>>> flag needs to be toggled to switch between A and B after the image is > >>>> written, but since that code is never executed, the previous system is > >>>> booted instead. > >>>> > >>>> - several issues cropped up with a set of sysupgrade changes i did > >>>> (among them, these you mention here). there are many device types and > >>>> several sysupgrade mechanisms with their own files, and then some > >>>> common > >>>> files. i assumed other types of upgrades would invoke common routines > >>>> but not -for instance- nand flash routines. i was wrong: the > >>>> codebase is > >>>> spaghetti calling any file form any sysupgrade method, and this > >>>> caused a > >>>> couple of issues with my nand sysupgrade changes. i don't think > >>>> this is > >>>> one of those instances though. i think that the sysupgrade code is > >>>> doing > >>>> the right thing and the fault is elsewhere, but i may be wrong. > >>>> without > >>>> knowledge of your device and without a device to test, it is hard > >>>> to tell. > >>>> > >>>> > >>>> maybe we should could call the attention of the device maintainer to > >>>> this thread now? > >>>> > >>>> > >>>> thanks! > >>>> > >>>> > >>> Hi, > >>> > >>> Many thanks for the very detailed reply. > >>> It aided hugely in debugging this further. > >>> > >>> I'm happy to share that I found the culprit and it works nicely > >>> again now. > >>> > >>> within nand.sh: > >>> > >>> > >>> nand_upgrade_ubinized() { > >>> local ubi_file="$1" > >>> local gz="$2" > >>> > >>> nand_detach_ubi "$CI_UBIPART" || return 1 > >>> > >>> local mtdnum="$( find_mtd_index "$CI_UBIPART" )" > >>> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - && > >>> ubiattach -m "$mtdnum" > >>> } > >>> > >>> I changed the last line to: > >>> > >>> ubiformat "/dev/mtd$mtdnum" -y -f "$ubi_file" && ubiattach > >>> -m "$mtdnum" > >>> > >>> > >>> Although ubiformat indeed states "-f -" to use stdinput, it does not > >>> seem to work. > >>> Writing the image seemed to be simply skipped using that method. > >>> > >>> actually presenting the absolute filepath to "-f" fixed it. > >>> > >>> Any thoughts? > >>> > >>> Thanks again for your prompt reply, > >>> > >>> Koen > >>> > >>> > > > > hi, > > > > my answers inline... > > > >> Summarized after more testing: > >> > >> --> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - && > >> ubiattach -m "$mtdnum" > >> Does not work. > > it works fine on other openwrt platforms. also, the commands cat and > > ubiformat are mature and stable and they work fine on your platform > > too. i am repeating myself, but according to your posted log, > > something is rebooting the system before the command ends. > > > > from your posted log: > > > > ----- > > libscan: scan[ 207.917055] usb usb1: USB disconnect, device number 1 > > ning eraseblock 1873 -- 98 % comp[ 207.925651] ci_hdrc ci_hdrc.0: USB > > bus 1 deregistered > > lete > > > > libscan: scanning eraseblo[ 207.934010] imx2-wdt 20bc000.watchdog: > > Device shutdown: Expect reboot! > > ck 1874 -- 98 % complete > > > > // check this out > > // | > > // v > > libsca[ 207.942382] reboot: Restarting system > > \FE > > U-Boot SPL 2020.04-dirty (May 20 2020 - 12:35:32 +0200) > > Booting from NAND > > ----- > > > > as you can see, the process is interrupted in the middle of printing a > > status line beginning with "libsca". there is no way that ubiformat > > will fail without a diagnostic message, much less fail in the middle > > of printing a diagnostics message. > > > > this is a platform issue and you will not fix it by changing this code. > > > >> Looks like ubiformat gets empty/invalid input from stdin, writing the > >> image seems to be skipped completely. > >> The board simply boots again in it's original state without any changes > > possibly an A/B dual boot system, possibly with dual ubi partitions. > > IDK, but the platform maintainer knows. > > > >> I confirmed that this command expands to "cat" and not "zcat" > > yes, this is visible in your log. > > > >> > >> > >> --> ubiformat "/dev/mtd$mtdnum" -y -f "$ubi_file" && ubiattach -m > >> "$mtdnum" > >> Works fine > >> image is written as expected and boots properly using the newly > >> written image > > possibly because it is a few percent faster than piping. > > > > IMHO, this is an extremely brittle workaround, not a fix. as soon as > > your flash becomes slower to erase or write or the openwrt image grows > > a bit, the problem will reappear. > > > >> > >> > >> --> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y && ubiattach > >> -m "$mtdnum" > >> Omitting "-f -" here > >> 'cat' throws a "Broken Pipe" error during the flash process. > > this is expected. without "-f -" ubiformat will not open stdin and > > then cat has nowhere to pipe to. > > > >> The data is written, but the image is corrupted and does not boot > >> afterwards. > > no. without -f the ubi partition is formatted but nothing is written > > onto it. > >> Is this error/corruption not present on other targets ?? > >> > >> > >> So unless anyone has other ideas .. it looks for 'ubinized' the file > >> needs to be provided as-is, and not using stdin. > >> This drops support for gzip'd files .. but it does not seem to be > >> gzipped anyway, at least on my target. > > the other idea is that the platform maintainer needs to be involved to > > track down what is rebooting the device in the middle of an update. > > > > i suppose this problem is platform-specific. in that case it might be > > a watchdog on your platform. my suspicion: the normal userland is > > brought down before sysupgrade (as rootfs is not available) and thus > > userland tickle of a watchdog ceases to happen. the watchdog expires > > and reboots the system mid upgrade. the correct fix is disabling the > > watchdog before sysupgrade. alternatively, the userland tickle could > > be kept working during sysupgrade. > > > > in particular, note this log line just 8 milliseconds before the reboot: > > [ 207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot! > > [ 207.942382] reboot: Restarting system > > > > i don't know nor have the platform, so please contact the maintainer. > > > >> > >> Regards, > >> > >> Koen > > > > > > thanks! > > > could you collect more logs during sysupgrade? let's see if the reboot > happens always approximately at the same time. > > also, you could you collect a full bootlog? > > thanks!
Hi, I think our previous mails overlapped a bit as I didn't notice your previous response :) I'll send the data tomorrow. I'm also wondering if adding a sleep before ubiformat in the old way would influence/break it's behaviour? Piotr, Would you have any idea here? Thanks again for your efforts, Koen _______________________________________________ openwrt-devel mailing list [email protected] https://lists.openwrt.org/mailman/listinfo/openwrt-devel
