On Thu, Feb 20, 2020, 9:22 AM Paul Barker <[email protected]> wrote:

> On Thu, 20 Feb 2020 at 12:04, Richard Purdie
> <[email protected]> wrote:
> >
> > On Thu, 2020-02-20 at 11:59 +0000, Paul Barker wrote:
> > > I'm now looking into this...
> > >
> > > In sstate_checkhashes() we mark sstate as available if
> > > fetcher.checkstatus() succeeds. Then at a later point
> > > sstate_setscene() calls sstate_installpkg() calls pstaging_fetch()
> > > calls fetcher.download() to actually get the sstate artifact. If the
> > > artifact is removed from the mirror between these two accesses (due
> > > to an sstate mirror clean up running in parallel to a build), or if
> > > there is an intermittent download failure we could see checkstatus()
> > > succeed then download() fail.
> > >
> > > I don't think we should ignore all setscene errors but in the
> > > specific case where it's the download step that fails I think that
> > > should be a warning. Or it could be an error by default with a
> > > variable we can set to turn it into a warning. Does that sound
> > > reasonable? If so I'll work up a patch.
> >
> > Thinking about the code, I'm not sure how you're generically going to
> > tell the difference between a setscene task that fails as the file
> > disappeared compared to a setscene failure with another real error? :/
> >
> > We could make all failed setscene tasks warnings but I think that
> > buries actual real errors.
> >
> > This is probably why I've not changed the code before now.
> >
> > Special exit code values? :/
> >
> > I'm open to proposals.
> >
> > I know we could put in some configuration option but in general I hate
> > these as it just means more test matrix combinations and more ways for
> > people to see different behaviours. They have a time/place but I'm not
> > sure its here.
>
> I agree - I really don't want to have to add additional complexity
> here. But I do think we need to fix this in some way, others are
> affected by this as can be seen from previous discussions. And in the
> case of a public sstate mirror we can't control when users decide to
> run builds, there will always be the chance of a user running a build
> on an old commit while old sstate artifacts are cleaned or starting a
> build just as the mirror is taken offline for some maintenance.
>
> I think we might be able to make this work if we can avoid adding any
> new conditional logic to the fetcher itself. I can see that almost
> every call to logger.error() is followed by raising an error - perhaps
> we could rework the code to include all the relevant info in the
> raised error object and allow higher level code to catch the exception
> and decide what to do with it. Because once logger.error() is called,
> knotty counts an error and bitbake will exit non-zero even if the
> error is safely handled. Once the fetcher simply raises exceptions in
> the case of failed downloads we could handle this neatly in
> sstate.bbclass. Would that be a viable way forward? Or would that
> break the other fetcher use cases?
>

FWIW we also have this problem because our CI nodes all update the sstate
cache via rsync after they finish, which causes races. This hasn't affected
our developers, but I suspect that is only because they aren't doing builds
at 1 AM.

The way we worked around it was to split up the build into two invocations
of bitbake:

 bitbake --setscene-only <target> || true
 bitbake --skip-setscene <target>

Although this will likely not work very well with hash equivalence.


> Thanks,
> Paul
> 
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#48565): https://lists.yoctoproject.org/g/yocto/message/48565
Mute This Topic: https://lists.yoctoproject.org/mt/71426351/21656
Group Owner: [email protected]
Unsubscribe: https://lists.yoctoproject.org/g/yocto/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to