Re: SquashFS mixed errors (decompression failed and others)
> The 4-"byte"-address mode is used on 32 MiB flash chips. > We had similar issues with other 32 MiB devices in the past > which were fixed at some point by Felix Fietkau. My device is 32MiB. I'll check with Felix if he can give me any clues. @Everyone else reading this, do you know how one can increase "the reset duration during booting" for the flash chip? (Not even sure I fully understand what this means) On Sun, May 23, 2021 at 10:28 AM Vincent Wiemann wrote: > > On 5/23/21 10:21 AM, Ibrahim Tachijian wrote: > >> Is your firmware (sysupgrade) bigger than 16MB? > > > > No, the sysupgrade file is currently 13MB. > > > >> So maybe it has to do with switching to 4-address-mode... > > What is this exactly? > > The 4-"byte"-address mode is used on 32 MiB flash chips. > We had similar issues with other 32 MiB devices in the past > which were fixed at some point by Felix Fietkau. > > >> My guess is that the error already happens when reading the flash. > > At least we know that the flash is not being written to incorrectly > > since after a reboot the flash is intact and does not produce any > > errors. It's simply random if the system boots into this "faulty > > state" or not (happens approx 1-2% of the time). > > > > Does anyone maybe know how I can re-read the squashfs partition and > > verify the integrity while the system is booted to see if I encounter > > the squashfs errors. > > I'm really at a loss here - no idea where to even look into diagnosing > > the issue. > > > > I guess the reset line of the flash chip is not hold long enough so > that it is in an unclean state. I think the reset duration during > booting needs to be increased. But I don't know the code and can't point > you there. It's just a guess... > > > > > > > > > On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann > > wrote: > >> > >> > >> > >> On 5/21/21 3:58 PM, Koen Vandeputte wrote: > >>> > >>> On 21.05.21 13:19, Ibrahim Tachijian wrote: > Hello, > > We use approximately 10k IPQ40XX devices and we have noticed that > every time we run "sysupgrade -n" we lose approximately 1% of the > routers in the process. > After further investigation I'm almost confident that it is not the > sysupgrade process that is the culprit - so what I did was that I put > one test router into a reboot loop. > > This is what I do; > > Boot the router in a fresh state after a newly installed image. > The image contains a reboot loop that consists of a shell script that > runs every minute. > > The shell script tries to run a php-script which simply echoes "Hello > World". If the php-script exists normally then we reboot the router. > > However the php-script exists abnormally then the router stops and > does nothing other than informing me that there was a bus-error making > php not able to process the hello world script. > > When this process runs the router reboots approximately 50 times > before it boots into a state which is faulty where I see bus-errors > when I try to run php scripts for example. > > > Looking into dmesg you can see some errors such as, > > [10985.209438] SQUASHFS error: squashfs_read_data failed to read block > 0x3a803e > [11045.218685] SQUASHFS error: xz decompression failed, data probably > corrupt > [11045.218731] SQUASHFS error: squashfs_read_data failed to read block > 0x3a803e > [11105.228157] SQUASHFS error: xz decompression failed, data probably > corrupt > [11105.228203] SQUASHFS error: squashfs_read_data failed to read block > 0x3a803e > > or > > [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size > 10234 > [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] > [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size > 10234 > [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] > [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size > 10234 > [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] > [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size > 10234 > > or > > [62745.801178] SQUASHFS error: squashfs_read_data failed to read block > 0x732ae2 > [62773.347234] SQUASHFS error: xz decompression failed, data probably > corrupt > [62773.347281] SQUASHFS error: squashfs_read_data failed to read block > 0x732ae2 > [62790.132661] SQUASHFS error: xz decompression failed, data probably > corrupt > [62790.132706] SQUASHFS error: squashfs_read_data failed to read block > 0x732ae2 > [62790.216746] SQUASHFS error: xz decompression failed, data probably > corrupt > [62790.216792] SQUASHFS error: squashfs_read_data failed to read block > 0x732ae2 > [62800.810525]
Re: SquashFS mixed errors (decompression failed and others)
On 5/23/21 10:21 AM, Ibrahim Tachijian wrote: Is your firmware (sysupgrade) bigger than 16MB? No, the sysupgrade file is currently 13MB. So maybe it has to do with switching to 4-address-mode... What is this exactly? The 4-"byte"-address mode is used on 32 MiB flash chips. We had similar issues with other 32 MiB devices in the past which were fixed at some point by Felix Fietkau. My guess is that the error already happens when reading the flash. At least we know that the flash is not being written to incorrectly since after a reboot the flash is intact and does not produce any errors. It's simply random if the system boots into this "faulty state" or not (happens approx 1-2% of the time). Does anyone maybe know how I can re-read the squashfs partition and verify the integrity while the system is booted to see if I encounter the squashfs errors. I'm really at a loss here - no idea where to even look into diagnosing the issue. I guess the reset line of the flash chip is not hold long enough so that it is in an unclean state. I think the reset duration during booting needs to be increased. But I don't know the code and can't point you there. It's just a guess... On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann wrote: On 5/21/21 3:58 PM, Koen Vandeputte wrote: On 21.05.21 13:19, Ibrahim Tachijian wrote: Hello, We use approximately 10k IPQ40XX devices and we have noticed that every time we run "sysupgrade -n" we lose approximately 1% of the routers in the process. After further investigation I'm almost confident that it is not the sysupgrade process that is the culprit - so what I did was that I put one test router into a reboot loop. This is what I do; Boot the router in a fresh state after a newly installed image. The image contains a reboot loop that consists of a shell script that runs every minute. The shell script tries to run a php-script which simply echoes "Hello World". If the php-script exists normally then we reboot the router. However the php-script exists abnormally then the router stops and does nothing other than informing me that there was a bus-error making php not able to process the hello world script. When this process runs the router reboots approximately 50 times before it boots into a state which is faulty where I see bus-errors when I try to run php scripts for example. Looking into dmesg you can see some errors such as, [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e or [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 or [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt Now, you would assume that the squashfs-partition is broken - but if this was the case then a reboot should not help. It does. Rebooting the router after it boots in this faulty state fixes the issue. So approximately 1-2% of my reboots make the router go into this faulty state. I am clueless on how to further investigate this issue. For now my work around is restarting the router via a bash script should it notice there are bus-errors or i/o errors. Thanks In the next kernel bump, following patch is also present: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3 I think it's worth a shot to retry the tests once it's bumped. Koen My guess is that the error already happens when reading the flash. Is your firmware (sysupgrade) bigger than 16MB? So maybe it has to do
Re: SquashFS mixed errors (decompression failed and others)
> Is your firmware (sysupgrade) bigger than 16MB? No, the sysupgrade file is currently 13MB. > So maybe it has to do with switching to 4-address-mode... What is this exactly? > My guess is that the error already happens when reading the flash. At least we know that the flash is not being written to incorrectly since after a reboot the flash is intact and does not produce any errors. It's simply random if the system boots into this "faulty state" or not (happens approx 1-2% of the time). Does anyone maybe know how I can re-read the squashfs partition and verify the integrity while the system is booted to see if I encounter the squashfs errors. I'm really at a loss here - no idea where to even look into diagnosing the issue. On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann wrote: > > > > On 5/21/21 3:58 PM, Koen Vandeputte wrote: > > > > On 21.05.21 13:19, Ibrahim Tachijian wrote: > >> Hello, > >> > >> We use approximately 10k IPQ40XX devices and we have noticed that > >> every time we run "sysupgrade -n" we lose approximately 1% of the > >> routers in the process. > >> After further investigation I'm almost confident that it is not the > >> sysupgrade process that is the culprit - so what I did was that I put > >> one test router into a reboot loop. > >> > >> This is what I do; > >> > >> Boot the router in a fresh state after a newly installed image. > >> The image contains a reboot loop that consists of a shell script that > >> runs every minute. > >> > >> The shell script tries to run a php-script which simply echoes "Hello > >> World". If the php-script exists normally then we reboot the router. > >> > >> However the php-script exists abnormally then the router stops and > >> does nothing other than informing me that there was a bus-error making > >> php not able to process the hello world script. > >> > >> When this process runs the router reboots approximately 50 times > >> before it boots into a state which is faulty where I see bus-errors > >> when I try to run php scripts for example. > >> > >> > >> Looking into dmesg you can see some errors such as, > >> > >> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block > >> 0x3a803e > >> [11045.218685] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block > >> 0x3a803e > >> [11105.228157] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block > >> 0x3a803e > >> > >> or > >> > >> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size > >> 10234 > >> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] > >> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size > >> 10234 > >> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] > >> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size > >> 10234 > >> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] > >> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size > >> 10234 > >> > >> or > >> > >> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block > >> 0x732ae2 > >> [62773.347234] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block > >> 0x732ae2 > >> [62790.132661] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block > >> 0x732ae2 > >> [62790.216746] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block > >> 0x732ae2 > >> [62800.810525] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block > >> 0x732ae2 > >> [62828.336267] SQUASHFS error: xz decompression failed, data probably > >> corrupt > >> > >> > >> > >> Now, you would assume that the squashfs-partition is broken - but if > >> this was the case then a reboot should not help. It does. > >> Rebooting the router after it boots in this faulty state fixes the issue. > >> > >> So approximately 1-2% of my reboots make the router go into this > >> faulty state. > >> > >> I am clueless on how to further investigate this issue. For now my > >> work around is restarting the router via a bash script should it > >> notice there are bus-errors or i/o errors. > >> > >> Thanks > >> > > In the next kernel bump, following patch is also present: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3 > > > > > > I think it's worth a shot to retry the tests once it's bumped. > > > > Koen > > > > My guess is that the error already happens when reading the flash. > Is your firmware (sysupgrade) bigger than 16MB? >
Re: SquashFS mixed errors (decompression failed and others)
On 5/21/21 3:58 PM, Koen Vandeputte wrote: On 21.05.21 13:19, Ibrahim Tachijian wrote: Hello, We use approximately 10k IPQ40XX devices and we have noticed that every time we run "sysupgrade -n" we lose approximately 1% of the routers in the process. After further investigation I'm almost confident that it is not the sysupgrade process that is the culprit - so what I did was that I put one test router into a reboot loop. This is what I do; Boot the router in a fresh state after a newly installed image. The image contains a reboot loop that consists of a shell script that runs every minute. The shell script tries to run a php-script which simply echoes "Hello World". If the php-script exists normally then we reboot the router. However the php-script exists abnormally then the router stops and does nothing other than informing me that there was a bus-error making php not able to process the hello world script. When this process runs the router reboots approximately 50 times before it boots into a state which is faulty where I see bus-errors when I try to run php scripts for example. Looking into dmesg you can see some errors such as, [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e or [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 or [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt Now, you would assume that the squashfs-partition is broken - but if this was the case then a reboot should not help. It does. Rebooting the router after it boots in this faulty state fixes the issue. So approximately 1-2% of my reboots make the router go into this faulty state. I am clueless on how to further investigate this issue. For now my work around is restarting the router via a bash script should it notice there are bus-errors or i/o errors. Thanks In the next kernel bump, following patch is also present: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3 I think it's worth a shot to retry the tests once it's bumped. Koen My guess is that the error already happens when reading the flash. Is your firmware (sysupgrade) bigger than 16MB? So maybe it has to do with switching to 4-address-mode... Best, Vincent ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: SquashFS mixed errors (decompression failed and others)
On 21.05.21 13:19, Ibrahim Tachijian wrote: Hello, We use approximately 10k IPQ40XX devices and we have noticed that every time we run "sysupgrade -n" we lose approximately 1% of the routers in the process. After further investigation I'm almost confident that it is not the sysupgrade process that is the culprit - so what I did was that I put one test router into a reboot loop. This is what I do; Boot the router in a fresh state after a newly installed image. The image contains a reboot loop that consists of a shell script that runs every minute. The shell script tries to run a php-script which simply echoes "Hello World". If the php-script exists normally then we reboot the router. However the php-script exists abnormally then the router stops and does nothing other than informing me that there was a bus-error making php not able to process the hello world script. When this process runs the router reboots approximately 50 times before it boots into a state which is faulty where I see bus-errors when I try to run php scripts for example. Looking into dmesg you can see some errors such as, [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e or [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 or [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt Now, you would assume that the squashfs-partition is broken - but if this was the case then a reboot should not help. It does. Rebooting the router after it boots in this faulty state fixes the issue. So approximately 1-2% of my reboots make the router go into this faulty state. I am clueless on how to further investigate this issue. For now my work around is restarting the router via a bash script should it notice there are bus-errors or i/o errors. Thanks In the next kernel bump, following patch is also present: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3 I think it's worth a shot to retry the tests once it's bumped. Koen ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: SquashFS mixed errors (decompression failed and others)
Hello Vincent, The board is the GL.iNet GL-B1300. You can read about it here: https://openwrt.org/toh/gl.inet/gl.inet_gl-b1300 On Fri, May 21, 2021 at 2:18 PM Vincent Wiemann wrote: > > Hi Ibrahim, > > please post the affected model name. It could e.g. be a bad board design > which requires the flash to be clocked slower or something. > Without knowing the affected board, it is guessing... > > Best, > > Vincent > > On 5/21/21 1:19 PM, Ibrahim Tachijian wrote: > > Hello, > > > > We use approximately 10k IPQ40XX devices and we have noticed that > > every time we run "sysupgrade -n" we lose approximately 1% of the > > routers in the process. > > After further investigation I'm almost confident that it is not the > > sysupgrade process that is the culprit - so what I did was that I put > > one test router into a reboot loop. > > > > This is what I do; > > > > Boot the router in a fresh state after a newly installed image. > > The image contains a reboot loop that consists of a shell script that > > runs every minute. > > > > The shell script tries to run a php-script which simply echoes "Hello > > World". If the php-script exists normally then we reboot the router. > > > > However the php-script exists abnormally then the router stops and > > does nothing other than informing me that there was a bus-error making > > php not able to process the hello world script. > > > > When this process runs the router reboots approximately 50 times > > before it boots into a state which is faulty where I see bus-errors > > when I try to run php scripts for example. > > > > > > Looking into dmesg you can see some errors such as, > > > > [10985.209438] SQUASHFS error: squashfs_read_data failed to read block > > 0x3a803e > > [11045.218685] SQUASHFS error: xz decompression failed, data probably > > corrupt > > [11045.218731] SQUASHFS error: squashfs_read_data failed to read block > > 0x3a803e > > [11105.228157] SQUASHFS error: xz decompression failed, data probably > > corrupt > > [11105.228203] SQUASHFS error: squashfs_read_data failed to read block > > 0x3a803e > > > > or > > > > [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 > > [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] > > [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 > > [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] > > [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 > > [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] > > [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 > > > > or > > > > [62745.801178] SQUASHFS error: squashfs_read_data failed to read block > > 0x732ae2 > > [62773.347234] SQUASHFS error: xz decompression failed, data probably > > corrupt > > [62773.347281] SQUASHFS error: squashfs_read_data failed to read block > > 0x732ae2 > > [62790.132661] SQUASHFS error: xz decompression failed, data probably > > corrupt > > [62790.132706] SQUASHFS error: squashfs_read_data failed to read block > > 0x732ae2 > > [62790.216746] SQUASHFS error: xz decompression failed, data probably > > corrupt > > [62790.216792] SQUASHFS error: squashfs_read_data failed to read block > > 0x732ae2 > > [62800.810525] SQUASHFS error: xz decompression failed, data probably > > corrupt > > [62800.810570] SQUASHFS error: squashfs_read_data failed to read block > > 0x732ae2 > > [62828.336267] SQUASHFS error: xz decompression failed, data probably > > corrupt > > > > > > > > Now, you would assume that the squashfs-partition is broken - but if > > this was the case then a reboot should not help. It does. > > Rebooting the router after it boots in this faulty state fixes the issue. > > > > So approximately 1-2% of my reboots make the router go into this faulty > > state. > > > > I am clueless on how to further investigate this issue. For now my > > work around is restarting the router via a bash script should it > > notice there are bus-errors or i/o errors. > > > > Thanks > > > > > > > ___ > openwrt-devel mailing list > openwrt-devel@lists.openwrt.org > https://lists.openwrt.org/mailman/listinfo/openwrt-devel -- Ibrahim Tachijian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: SquashFS mixed errors (decompression failed and others)
Hi Ibrahim, please post the affected model name. It could e.g. be a bad board design which requires the flash to be clocked slower or something. Without knowing the affected board, it is guessing... Best, Vincent On 5/21/21 1:19 PM, Ibrahim Tachijian wrote: Hello, We use approximately 10k IPQ40XX devices and we have noticed that every time we run "sysupgrade -n" we lose approximately 1% of the routers in the process. After further investigation I'm almost confident that it is not the sysupgrade process that is the culprit - so what I did was that I put one test router into a reboot loop. This is what I do; Boot the router in a fresh state after a newly installed image. The image contains a reboot loop that consists of a shell script that runs every minute. The shell script tries to run a php-script which simply echoes "Hello World". If the php-script exists normally then we reboot the router. However the php-script exists abnormally then the router stops and does nothing other than informing me that there was a bus-error making php not able to process the hello world script. When this process runs the router reboots approximately 50 times before it boots into a state which is faulty where I see bus-errors when I try to run php scripts for example. Looking into dmesg you can see some errors such as, [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e or [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 or [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt Now, you would assume that the squashfs-partition is broken - but if this was the case then a reboot should not help. It does. Rebooting the router after it boots in this faulty state fixes the issue. So approximately 1-2% of my reboots make the router go into this faulty state. I am clueless on how to further investigate this issue. For now my work around is restarting the router via a bash script should it notice there are bus-errors or i/o errors. Thanks ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
SquashFS mixed errors (decompression failed and others)
Hello, We use approximately 10k IPQ40XX devices and we have noticed that every time we run "sysupgrade -n" we lose approximately 1% of the routers in the process. After further investigation I'm almost confident that it is not the sysupgrade process that is the culprit - so what I did was that I put one test router into a reboot loop. This is what I do; Boot the router in a fresh state after a newly installed image. The image contains a reboot loop that consists of a shell script that runs every minute. The shell script tries to run a php-script which simply echoes "Hello World". If the php-script exists normally then we reboot the router. However the php-script exists abnormally then the router stops and does nothing other than informing me that there was a bus-error making php not able to process the hello world script. When this process runs the router reboots approximately 50 times before it boots into a state which is faulty where I see bus-errors when I try to run php scripts for example. Looking into dmesg you can see some errors such as, [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e or [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 or [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt Now, you would assume that the squashfs-partition is broken - but if this was the case then a reboot should not help. It does. Rebooting the router after it boots in this faulty state fixes the issue. So approximately 1-2% of my reboots make the router go into this faulty state. I am clueless on how to further investigate this issue. For now my work around is restarting the router via a bash script should it notice there are bus-errors or i/o errors. Thanks -- Ibrahim Tachijian ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel