Not a solution per say, but I can give you some info on how we solve the
reliability issue in our product that uses RAUC.

* We store the env at a raw offset in the eMMC (this should work for SD as
well) rather than on a FAT partition as a file. You will need to set your
partition table up to leave room for this and modify the U-Boot config.
* We use redundant u-boot environments placed in different sectors of the
eMMC. This is a built-in feature of U-Boot that can be enabled in the
config. If one gets corrupted it will fall back on the previous gracefully.
* We have custom code both in U-Boot and in Linux that checks for
corrupt or inconsistent RAUC U-Boot environment vars. If they are totally
out of whack we will boot into our fail-safe recovery mode where the evn
vars are reset to a sane default and an update can be performed (no RMA
needed).

Over the past year we've had this setup. I haven't once seen or heard of
actually hitting a corrupt U-Boot env in any of our development units. We
unfortunately don't have analytics around this event in the field.

I know this isn't exactly an answer to your question, but hopefully some of
this helps you arrive at a robust solution for your setup.

Best,
~Matt

On Sun, Mar 28, 2021 at 6:11 AM Einar Vading <einar.vad...@rhimagnesita.com>
wrote:

> > Hi,
> >
> > On Fri, 2021-03-26 at 05:48 +0000, Einar Vading wrote:
> > > > > Hi,
> > > > >
> > > > > On Thu, 2021-03-25 at 15:22 +0000, Einar Vading wrote:
> > > > > > We have a Raspberry Pi 4 system set up using RAUC for updates
> and u-boot
> > > > > > for
> > > > > > booting. For some systems in the field we have the u-boot
> environment on
> > > > > > the
> > > > > > FAT boot partition and we mount that in fstab so that RAUC can
> access it
> > > > > > with
> > > > > > the fw_print/setenv commands.
> > > > > >
> > > > > > One issue we have seen is that the env-file gets corrupted every
> now and
> > > > > > then.
> > > > > > After corruption we can't RAUC update. The only solution we have
> to this
> > > > > > problem now is to delete the corrupted env-file and reboot, then
> we can
> > > > > > perform the upgrade.
> > > > > >
> > > > > > I have no idea how to track down whatever corrupts the file and
> I was
> > > > > > wondering if anyone has any input.
> > > > >
> > > > > You could try placing the environment on a separate partition to
> avoid any
> > > > > potential issues in the FAT implementation. Also, I think U-Boot
> has a way
> > > > > to
> > > > > support redundant environments.
> > >
> > > I have just done this for our newer systems. I moved the GPT
> partitions back
> > > 4MB and placed two redundant environments between the GPT and the
> first GPT
> > > partition.
> > >
> > > It is my understanding though that redundant environments are not
> supported
> > > when storing the env on FAT?
> >
> > That's probably a question for the U-Boot mailing list. :)
> >
> > > > Exactly. This should also be documented in the U-Boot integration
> guideline
> > > > for eMMC:
> > > >
> > > >
> > > >
> https://rauc.readthedocs.io/en/latest/integration.html#example-setting-up-u-boot-environment-on-emmc-sd-card
> > > >
> > > > When writing to the FAT very short before hard rebooting, I could
> imagine
> > > > this
> > > > can lead to failures. Do you see the corruption only after updates,
> or also
> > > > suddenly after n boots?
> > >
> > > Yes, this is something we have been able to test. If we cut the power
> > > precisely when the env is written to FAT we can corrupt the entire boot
> > > partition.
> > > Super scary but this is not the problem we're seeing in the field. That
> > > problem is more subtle.
> >
> > It should be possible to mount fat with the 'sync' option, but I'm not
> sure if
> > that would help in this case. I'd recommend avoiding mounting FAT
> filesystems
> > R/W if possible.
>
> Maybe it could help with the problem I'm investigating. Don't think it
> would help with
> the total corruption on powerloss when writing u-boot env, since that is
> in u-boot and
> the fs is not "mounted" yet.
>
> > > > How does the system report the corruption?
> > >
> > > fw_printenv and fw_setenv stops working and says that the env is
> corrupted.
> > > That also means that RAUC update fails, that is usually when we notice
> it.
> > >
> > > Is there a way to watch a file and record any process that modifies it?
> >
> > There is blktrace, but you don't see the contents that way. It still may
> be
> > enough detail to understand what's happening here.
>
> Great, I'll check that out.
>
> > Regards,
> > Jan
>
> Thanks for all the help.
>
> Regards,
> Einar
>
> _______________________________________________
> RAUC mailing list
>


-- 
Matthew Campbell
Principal Engineer
mcampb...@izotope.com

iZotope, Inc.
www.izotope.com
_______________________________________________
RAUC mailing list

Reply via email to