Re: SquashFS mixed errors (decompression failed and others)

2021-05-24 Thread Ibrahim Tachijian
> The 4-"byte"-address mode is used on 32 MiB flash chips.
> We had similar issues with other 32 MiB devices in the past
> which were fixed at some point by Felix Fietkau.

My device is 32MiB. I'll check with Felix if he can give me any clues.

@Everyone else reading this, do you know how one can increase "the
reset duration during
booting" for the flash chip? (Not even sure I fully understand what this means)


On Sun, May 23, 2021 at 10:28 AM Vincent Wiemann
 wrote:
>
> On 5/23/21 10:21 AM, Ibrahim Tachijian wrote:
> >> Is your firmware (sysupgrade) bigger than 16MB?
> >
> > No, the sysupgrade file is currently 13MB.
> >
> >> So maybe it has to do with switching to 4-address-mode...
> > What is this exactly?
>
> The 4-"byte"-address mode is used on 32 MiB flash chips.
> We had similar issues with other 32 MiB devices in the past
> which were fixed at some point by Felix Fietkau.
>
> >> My guess is that the error already happens when reading the flash.
> > At least we know that the flash is not being written to incorrectly
> > since after a reboot the flash is intact and does not produce any
> > errors. It's simply random if the system boots into this "faulty
> > state" or not (happens approx 1-2% of the time).
> >
> > Does anyone maybe know how I can re-read the squashfs partition and
> > verify the integrity while the system is booted to see if I encounter
> > the squashfs errors.
> > I'm really at a loss here - no idea where to even look into diagnosing
> > the issue.
> >
>
> I guess the reset line of the flash chip is not hold long enough so
> that it is in an unclean state. I think the reset duration during
> booting needs to be increased. But I don't know the code and can't point
> you there. It's just a guess...
>
> >
> >
> >
> > On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann
> >  wrote:
> >>
> >>
> >>
> >> On 5/21/21 3:58 PM, Koen Vandeputte wrote:
> >>>
> >>> On 21.05.21 13:19, Ibrahim Tachijian wrote:
>  Hello,
> 
>  We use approximately 10k IPQ40XX devices and we have noticed that
>  every time we run "sysupgrade -n" we lose approximately 1% of the
>  routers in the process.
>  After further investigation I'm almost confident that it is not the
>  sysupgrade process that is the culprit - so what I did was that I put
>  one test router into a reboot loop.
> 
>  This is what I do;
> 
>  Boot the router in a fresh state after a newly installed image.
>  The image contains a reboot loop that consists of a shell script that
>  runs every minute.
> 
>  The shell script tries to run a php-script which simply echoes "Hello
>  World". If the php-script exists normally then we reboot the router.
> 
>  However the php-script exists abnormally then the router stops and
>  does nothing other than informing me that there was a bus-error making
>  php not able to process the hello world script.
> 
>  When this process runs the router reboots approximately 50 times
>  before it boots into a state which is faulty where I see bus-errors
>  when I try to run php scripts for example.
> 
> 
>  Looking into dmesg you can see some errors such as,
> 
>  [10985.209438] SQUASHFS error: squashfs_read_data failed to read block
>  0x3a803e
>  [11045.218685] SQUASHFS error: xz decompression failed, data probably
>  corrupt
>  [11045.218731] SQUASHFS error: squashfs_read_data failed to read block
>  0x3a803e
>  [11105.228157] SQUASHFS error: xz decompression failed, data probably
>  corrupt
>  [11105.228203] SQUASHFS error: squashfs_read_data failed to read block
>  0x3a803e
> 
>  or
> 
>  [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
>  10234
>  [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
>  [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
>  10234
>  [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
>  [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
>  10234
>  [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
>  [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
>  10234
> 
>  or
> 
>  [62745.801178] SQUASHFS error: squashfs_read_data failed to read block
>  0x732ae2
>  [62773.347234] SQUASHFS error: xz decompression failed, data probably
>  corrupt
>  [62773.347281] SQUASHFS error: squashfs_read_data failed to read block
>  0x732ae2
>  [62790.132661] SQUASHFS error: xz decompression failed, data probably
>  corrupt
>  [62790.132706] SQUASHFS error: squashfs_read_data failed to read block
>  0x732ae2
>  [62790.216746] SQUASHFS error: xz decompression failed, data probably
>  corrupt
>  [62790.216792] SQUASHFS error: squashfs_read_data failed to read block
>  0x732ae2
>  [62800.810525] 

Re: SquashFS mixed errors (decompression failed and others)

2021-05-23 Thread Vincent Wiemann

On 5/23/21 10:21 AM, Ibrahim Tachijian wrote:

Is your firmware (sysupgrade) bigger than 16MB?


No, the sysupgrade file is currently 13MB.


So maybe it has to do with switching to 4-address-mode...

What is this exactly?


The 4-"byte"-address mode is used on 32 MiB flash chips.
We had similar issues with other 32 MiB devices in the past
which were fixed at some point by Felix Fietkau.


My guess is that the error already happens when reading the flash.

At least we know that the flash is not being written to incorrectly
since after a reboot the flash is intact and does not produce any
errors. It's simply random if the system boots into this "faulty
state" or not (happens approx 1-2% of the time).

Does anyone maybe know how I can re-read the squashfs partition and
verify the integrity while the system is booted to see if I encounter
the squashfs errors.
I'm really at a loss here - no idea where to even look into diagnosing
the issue.



I guess the reset line of the flash chip is not hold long enough so
that it is in an unclean state. I think the reset duration during
booting needs to be increased. But I don't know the code and can't point
you there. It's just a guess...





On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann
 wrote:




On 5/21/21 3:58 PM, Koen Vandeputte wrote:


On 21.05.21 13:19, Ibrahim Tachijian wrote:

Hello,

We use approximately 10k IPQ40XX devices and we have noticed that
every time we run "sysupgrade -n" we lose approximately 1% of the
routers in the process.
After further investigation I'm almost confident that it is not the
sysupgrade process that is the culprit - so what I did was that I put
one test router into a reboot loop.

This is what I do;

Boot the router in a fresh state after a newly installed image.
The image contains a reboot loop that consists of a shell script that
runs every minute.

The shell script tries to run a php-script which simply echoes "Hello
World". If the php-script exists normally then we reboot the router.

However the php-script exists abnormally then the router stops and
does nothing other than informing me that there was a bus-error making
php not able to process the hello world script.

When this process runs the router reboots approximately 50 times
before it boots into a state which is faulty where I see bus-errors
when I try to run php scripts for example.


Looking into dmesg you can see some errors such as,

[10985.209438] SQUASHFS error: squashfs_read_data failed to read block
0x3a803e
[11045.218685] SQUASHFS error: xz decompression failed, data probably
corrupt
[11045.218731] SQUASHFS error: squashfs_read_data failed to read block
0x3a803e
[11105.228157] SQUASHFS error: xz decompression failed, data probably
corrupt
[11105.228203] SQUASHFS error: squashfs_read_data failed to read block
0x3a803e

or

[26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
10234
[26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
10234
[26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
10234
[26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
[26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
10234

or

[62745.801178] SQUASHFS error: squashfs_read_data failed to read block
0x732ae2
[62773.347234] SQUASHFS error: xz decompression failed, data probably
corrupt
[62773.347281] SQUASHFS error: squashfs_read_data failed to read block
0x732ae2
[62790.132661] SQUASHFS error: xz decompression failed, data probably
corrupt
[62790.132706] SQUASHFS error: squashfs_read_data failed to read block
0x732ae2
[62790.216746] SQUASHFS error: xz decompression failed, data probably
corrupt
[62790.216792] SQUASHFS error: squashfs_read_data failed to read block
0x732ae2
[62800.810525] SQUASHFS error: xz decompression failed, data probably
corrupt
[62800.810570] SQUASHFS error: squashfs_read_data failed to read block
0x732ae2
[62828.336267] SQUASHFS error: xz decompression failed, data probably
corrupt



Now, you would assume that the squashfs-partition is broken - but if
this was the case then a reboot should not help. It does.
Rebooting the router after it boots in this faulty state fixes the issue.

So approximately 1-2% of my reboots make the router go into this
faulty state.

I am clueless on how to further investigate this issue. For now my
work around is restarting the router via a bash script should it
notice there are bus-errors or i/o errors.

Thanks


In the next kernel bump, following patch is also present:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3


I think it's worth a shot to retry the tests once it's bumped.

Koen



My guess is that the error already happens when reading the flash.
Is your firmware (sysupgrade) bigger than 16MB?
So maybe it has to do 

Re: SquashFS mixed errors (decompression failed and others)

2021-05-23 Thread Ibrahim Tachijian
> Is your firmware (sysupgrade) bigger than 16MB?

No, the sysupgrade file is currently 13MB.

> So maybe it has to do with switching to 4-address-mode...
What is this exactly?

> My guess is that the error already happens when reading the flash.
At least we know that the flash is not being written to incorrectly
since after a reboot the flash is intact and does not produce any
errors. It's simply random if the system boots into this "faulty
state" or not (happens approx 1-2% of the time).

Does anyone maybe know how I can re-read the squashfs partition and
verify the integrity while the system is booted to see if I encounter
the squashfs errors.
I'm really at a loss here - no idea where to even look into diagnosing
the issue.




On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann
 wrote:
>
>
>
> On 5/21/21 3:58 PM, Koen Vandeputte wrote:
> >
> > On 21.05.21 13:19, Ibrahim Tachijian wrote:
> >> Hello,
> >>
> >> We use approximately 10k IPQ40XX devices and we have noticed that
> >> every time we run "sysupgrade -n" we lose approximately 1% of the
> >> routers in the process.
> >> After further investigation I'm almost confident that it is not the
> >> sysupgrade process that is the culprit - so what I did was that I put
> >> one test router into a reboot loop.
> >>
> >> This is what I do;
> >>
> >> Boot the router in a fresh state after a newly installed image.
> >> The image contains a reboot loop that consists of a shell script that
> >> runs every minute.
> >>
> >> The shell script tries to run a php-script which simply echoes "Hello
> >> World". If the php-script exists normally then we reboot the router.
> >>
> >> However the php-script exists abnormally then the router stops and
> >> does nothing other than informing me that there was a bus-error making
> >> php not able to process the hello world script.
> >>
> >> When this process runs the router reboots approximately 50 times
> >> before it boots into a state which is faulty where I see bus-errors
> >> when I try to run php scripts for example.
> >>
> >>
> >> Looking into dmesg you can see some errors such as,
> >>
> >> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x3a803e
> >> [11045.218685] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x3a803e
> >> [11105.228157] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x3a803e
> >>
> >> or
> >>
> >> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
> >> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
> >> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
> >> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >>
> >> or
> >>
> >> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62773.347234] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62790.132661] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62790.216746] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62800.810525] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62828.336267] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >>
> >>
> >>
> >> Now, you would assume that the squashfs-partition is broken - but if
> >> this was the case then a reboot should not help. It does.
> >> Rebooting the router after it boots in this faulty state fixes the issue.
> >>
> >> So approximately 1-2% of my reboots make the router go into this
> >> faulty state.
> >>
> >> I am clueless on how to further investigate this issue. For now my
> >> work around is restarting the router via a bash script should it
> >> notice there are bus-errors or i/o errors.
> >>
> >> Thanks
> >>
> > In the next kernel bump, following patch is also present:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3
> >
> >
> > I think it's worth a shot to retry the tests once it's bumped.
> >
> > Koen
> >
>
> My guess is that the error already happens when reading the flash.
> Is your firmware (sysupgrade) bigger than 16MB?
> 

Re: SquashFS mixed errors (decompression failed and others)

2021-05-21 Thread Vincent Wiemann




On 5/21/21 3:58 PM, Koen Vandeputte wrote:


On 21.05.21 13:19, Ibrahim Tachijian wrote:

Hello,

We use approximately 10k IPQ40XX devices and we have noticed that
every time we run "sysupgrade -n" we lose approximately 1% of the
routers in the process.
After further investigation I'm almost confident that it is not the
sysupgrade process that is the culprit - so what I did was that I put
one test router into a reboot loop.

This is what I do;

Boot the router in a fresh state after a newly installed image.
The image contains a reboot loop that consists of a shell script that
runs every minute.

The shell script tries to run a php-script which simply echoes "Hello
World". If the php-script exists normally then we reboot the router.

However the php-script exists abnormally then the router stops and
does nothing other than informing me that there was a bus-error making
php not able to process the hello world script.

When this process runs the router reboots approximately 50 times
before it boots into a state which is faulty where I see bus-errors
when I try to run php scripts for example.


Looking into dmesg you can see some errors such as,

[10985.209438] SQUASHFS error: squashfs_read_data failed to read block 
0x3a803e
[11045.218685] SQUASHFS error: xz decompression failed, data probably 
corrupt
[11045.218731] SQUASHFS error: squashfs_read_data failed to read block 
0x3a803e
[11105.228157] SQUASHFS error: xz decompression failed, data probably 
corrupt
[11105.228203] SQUASHFS error: squashfs_read_data failed to read block 
0x3a803e


or

[26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 
10234

[26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 
10234

[26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 
10234

[26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
[26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 
10234


or

[62745.801178] SQUASHFS error: squashfs_read_data failed to read block 
0x732ae2
[62773.347234] SQUASHFS error: xz decompression failed, data probably 
corrupt
[62773.347281] SQUASHFS error: squashfs_read_data failed to read block 
0x732ae2
[62790.132661] SQUASHFS error: xz decompression failed, data probably 
corrupt
[62790.132706] SQUASHFS error: squashfs_read_data failed to read block 
0x732ae2
[62790.216746] SQUASHFS error: xz decompression failed, data probably 
corrupt
[62790.216792] SQUASHFS error: squashfs_read_data failed to read block 
0x732ae2
[62800.810525] SQUASHFS error: xz decompression failed, data probably 
corrupt
[62800.810570] SQUASHFS error: squashfs_read_data failed to read block 
0x732ae2
[62828.336267] SQUASHFS error: xz decompression failed, data probably 
corrupt




Now, you would assume that the squashfs-partition is broken - but if
this was the case then a reboot should not help. It does.
Rebooting the router after it boots in this faulty state fixes the issue.

So approximately 1-2% of my reboots make the router go into this 
faulty state.


I am clueless on how to further investigate this issue. For now my
work around is restarting the router via a bash script should it
notice there are bus-errors or i/o errors.

Thanks


In the next kernel bump, following patch is also present:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3 



I think it's worth a shot to retry the tests once it's bumped.

Koen



My guess is that the error already happens when reading the flash.
Is your firmware (sysupgrade) bigger than 16MB?
So maybe it has to do with switching to 4-address-mode...

Best,

Vincent

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: SquashFS mixed errors (decompression failed and others)

2021-05-21 Thread Koen Vandeputte



On 21.05.21 13:19, Ibrahim Tachijian wrote:

Hello,

We use approximately 10k IPQ40XX devices and we have noticed that
every time we run "sysupgrade -n" we lose approximately 1% of the
routers in the process.
After further investigation I'm almost confident that it is not the
sysupgrade process that is the culprit - so what I did was that I put
one test router into a reboot loop.

This is what I do;

Boot the router in a fresh state after a newly installed image.
The image contains a reboot loop that consists of a shell script that
runs every minute.

The shell script tries to run a php-script which simply echoes "Hello
World". If the php-script exists normally then we reboot the router.

However the php-script exists abnormally then the router stops and
does nothing other than informing me that there was a bus-error making
php not able to process the hello world script.

When this process runs the router reboots approximately 50 times
before it boots into a state which is faulty where I see bus-errors
when I try to run php scripts for example.


Looking into dmesg you can see some errors such as,

[10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt
[11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt
[11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e

or

[26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
[26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234

or

[62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt
[62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt
[62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt



Now, you would assume that the squashfs-partition is broken - but if
this was the case then a reboot should not help. It does.
Rebooting the router after it boots in this faulty state fixes the issue.

So approximately 1-2% of my reboots make the router go into this faulty state.

I am clueless on how to further investigate this issue. For now my
work around is restarting the router via a bash script should it
notice there are bus-errors or i/o errors.

Thanks


In the next kernel bump, following patch is also present:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38=2ed1d90162a0c0683ecbe0c4802187fa22d641c3

I think it's worth a shot to retry the tests once it's bumped.

Koen


___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: SquashFS mixed errors (decompression failed and others)

2021-05-21 Thread Ibrahim Tachijian
Hello Vincent,

The board is the GL.iNet GL-B1300.
You can read about it here: https://openwrt.org/toh/gl.inet/gl.inet_gl-b1300


On Fri, May 21, 2021 at 2:18 PM Vincent Wiemann
 wrote:
>
> Hi Ibrahim,
>
> please post the affected model name. It could e.g. be a bad board design
> which requires the flash to be clocked slower or something.
> Without knowing the affected board, it is guessing...
>
> Best,
>
> Vincent
>
> On 5/21/21 1:19 PM, Ibrahim Tachijian wrote:
> > Hello,
> >
> > We use approximately 10k IPQ40XX devices and we have noticed that
> > every time we run "sysupgrade -n" we lose approximately 1% of the
> > routers in the process.
> > After further investigation I'm almost confident that it is not the
> > sysupgrade process that is the culprit - so what I did was that I put
> > one test router into a reboot loop.
> >
> > This is what I do;
> >
> > Boot the router in a fresh state after a newly installed image.
> > The image contains a reboot loop that consists of a shell script that
> > runs every minute.
> >
> > The shell script tries to run a php-script which simply echoes "Hello
> > World". If the php-script exists normally then we reboot the router.
> >
> > However the php-script exists abnormally then the router stops and
> > does nothing other than informing me that there was a bus-error making
> > php not able to process the hello world script.
> >
> > When this process runs the router reboots approximately 50 times
> > before it boots into a state which is faulty where I see bus-errors
> > when I try to run php scripts for example.
> >
> >
> > Looking into dmesg you can see some errors such as,
> >
> > [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x3a803e
> > [11045.218685] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> > [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x3a803e
> > [11105.228157] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> > [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x3a803e
> >
> > or
> >
> > [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> > [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
> > [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> > [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
> > [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> > [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
> > [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> >
> > or
> >
> > [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x732ae2
> > [62773.347234] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> > [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x732ae2
> > [62790.132661] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> > [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x732ae2
> > [62790.216746] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> > [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x732ae2
> > [62800.810525] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> > [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 
> > 0x732ae2
> > [62828.336267] SQUASHFS error: xz decompression failed, data probably 
> > corrupt
> >
> >
> >
> > Now, you would assume that the squashfs-partition is broken - but if
> > this was the case then a reboot should not help. It does.
> > Rebooting the router after it boots in this faulty state fixes the issue.
> >
> > So approximately 1-2% of my reboots make the router go into this faulty 
> > state.
> >
> > I am clueless on how to further investigate this issue. For now my
> > work around is restarting the router via a bash script should it
> > notice there are bus-errors or i/o errors.
> >
> > Thanks
> >
> >
>
>
> ___
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel



--
Ibrahim Tachijian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: SquashFS mixed errors (decompression failed and others)

2021-05-21 Thread Vincent Wiemann

Hi Ibrahim,

please post the affected model name. It could e.g. be a bad board design
which requires the flash to be clocked slower or something.
Without knowing the affected board, it is guessing...

Best,

Vincent

On 5/21/21 1:19 PM, Ibrahim Tachijian wrote:

Hello,

We use approximately 10k IPQ40XX devices and we have noticed that
every time we run "sysupgrade -n" we lose approximately 1% of the
routers in the process.
After further investigation I'm almost confident that it is not the
sysupgrade process that is the culprit - so what I did was that I put
one test router into a reboot loop.

This is what I do;

Boot the router in a fresh state after a newly installed image.
The image contains a reboot loop that consists of a shell script that
runs every minute.

The shell script tries to run a php-script which simply echoes "Hello
World". If the php-script exists normally then we reboot the router.

However the php-script exists abnormally then the router stops and
does nothing other than informing me that there was a bus-error making
php not able to process the hello world script.

When this process runs the router reboots approximately 50 times
before it boots into a state which is faulty where I see bus-errors
when I try to run php scripts for example.


Looking into dmesg you can see some errors such as,

[10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt
[11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt
[11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e

or

[26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
[26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234

or

[62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt
[62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt
[62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt



Now, you would assume that the squashfs-partition is broken - but if
this was the case then a reboot should not help. It does.
Rebooting the router after it boots in this faulty state fixes the issue.

So approximately 1-2% of my reboots make the router go into this faulty state.

I am clueless on how to further investigate this issue. For now my
work around is restarting the router via a bash script should it
notice there are bus-errors or i/o errors.

Thanks





___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


SquashFS mixed errors (decompression failed and others)

2021-05-21 Thread Ibrahim Tachijian
Hello,

We use approximately 10k IPQ40XX devices and we have noticed that
every time we run "sysupgrade -n" we lose approximately 1% of the
routers in the process.
After further investigation I'm almost confident that it is not the
sysupgrade process that is the culprit - so what I did was that I put
one test router into a reboot loop.

This is what I do;

Boot the router in a fresh state after a newly installed image.
The image contains a reboot loop that consists of a shell script that
runs every minute.

The shell script tries to run a php-script which simply echoes "Hello
World". If the php-script exists normally then we reboot the router.

However the php-script exists abnormally then the router stops and
does nothing other than informing me that there was a bus-error making
php not able to process the hello world script.

When this process runs the router reboots approximately 50 times
before it boots into a state which is faulty where I see bus-errors
when I try to run php scripts for example.


Looking into dmesg you can see some errors such as,

[10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt
[11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt
[11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e

or

[26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
[26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234

or

[62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt
[62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt
[62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt



Now, you would assume that the squashfs-partition is broken - but if
this was the case then a reboot should not help. It does.
Rebooting the router after it boots in this faulty state fixes the issue.

So approximately 1-2% of my reboots make the router go into this faulty state.

I am clueless on how to further investigate this issue. For now my
work around is restarting the router via a bash script should it
notice there are bus-errors or i/o errors.

Thanks


-- 
Ibrahim Tachijian

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel