On Sun, 2023-08-06 at 13:15 -0600, Joshua Watt wrote:
> On Sat, Aug 5, 2023 at 7:54 PM Maksim Chichikalov
> <[email protected]> wrote:
> > My name is Max; nice to e-meet you. I need your help with
> > Equivalence Server :)
> > I watched your presentation on Youtube - it 100% helped me to
> > understand the logic better and debug the issue we are facing.
> > Thank you for all your input into OE development.
> > 
> > I found in OE repo that you introduced "def OEOuthashBasic()", so I
> > decided to write to you first before opening the topic on an email
> > list.
> > 
> > The problem:
> > 
> > Input data are propagated to the output hash generation function,
> > which generates a different out-hash.
> > 
> > Description:
> > 
> > All "_git.bb" recipes have to append SRCPV to PV. As a result, PV
> > is different on each commit.
> > 
> > The OEOuthashBasic function includes SSTATE_PKGSPEC to generate
> > hash, which contains PV (PV contains git hash). As a result, there
> > is no way to generate the same out-hash even if changes introduced
> > within a commit were trivial.
> > 
> 
> Right, this is sort of on purpose, because the hash equivalence is
> basically trying to say that an sstate object can be used in place of
> another one, even when the task hashes aren't the same (but the
> output hashes are). However, the sstate code itself will only look
> for sstate object with a certain name (which include PV); hash
> equivalency does have _some_ control over the file name sstate looks
> for, since it will replace the taskhash portion of the name with the
> unified hash, but it doesn't have complete control.
>  
> > 
> > In our codebase, our components have API part, which is managed by
> > an independent recipe per component. The described above problem
> > caused the recompilation of all components dependent on API, even
> > in cases when API was not changed. CI for pull requests recompiles
> > mostly the entire code base, I need to do something with it. 
> > (sorry, quite hard for me to explain it in a nutshell, let me know
> > if you like to know slightly more details) 
> > 
> 
> Ya, sounds like a typical mono-repo design?
>  
> > 
> > I see a couple of options for us:
> >  * Add a custom implementation of out-hash generated function and
> > overwrite SSTATE_HASHEQUIV_METHOD.
> >  * Better understand why it's mandatory to append SRCPV to PV, and
> > maybe it's flexible in our cases to do it.
> > 
> 
> This might be the best option, at least for your recipes, but I've
> CC'd the list for additional feedback
>  
> >  * Propose a patch to fix OEOuthashBasic().
> > 
> > In my humble opinion, the commit's hash shouldn't be included in
> > out-hash generation, it doesn't make sense. Unless I'm missing
> > something important - What are your thoughts?
> > 
> 
> Yes and no. It's not intentional, but a side effect of hash
> equivalency trying to make sure that the things it's marking as
> equivalent can actually be found in sstate (basically, because sstate
> include the commit hash, hash equivalency kinda has to include it).

This all sounds a bit unfortunate.

sstate only works as long as the filenames are predictable. Some
elements of the sstate filenames are essential to operation, e.g. the
package architecture since one input would result in multiple files
with the same hash in the filename of the output otherwise. The recipe
name and version are there mainly for debugging to allow someone to
more easily know where an sstate object came from and what it
represents. This is summarised by the comment in sstate.bbclass "Fields
0,5,6 are mandatory, 1 is most useful, 2,3,4 are just for information"
in generate_sstatefn().

When we added hash equivalence, we added the ability to equate the
hashes but we'd not considered that the version string mismatch may
stop significant artefact reuse. I suspect at the time we reasoned that
if the version changes, the output probably does too.

Sadly fixing this isn't simple. Changing the hash algorithm isn't
enough, we need to stop the SRCREV part of PV being used in the sstate
filename. If we stop that happening, the output hash algorithm may well
"just work" at that point, I'm not sure if it directly encodes PV or
just the sstate filename, hopefully the latter.

The hard part is how to do this generically without adding a lot of
complex and potentially fragile code. The datastore context in that
function is the core configuration, not the recipe's datastore. The
options I can think of offhand:

a) Live with the issue

b) Put a hack into generate_sstatefn() which changed the PV element if
it matched a pattern but we'd have to do this for each SCM as needed.
Ugly but lowest overhead.

c) Indirect PV to SSTATEPV (or similar) and then alter SSTATEPV to drop
SRCPV to a fixed string. Nicer code in some ways but horrible
parsing/performance overhead since every recipe, even non-SCM ones
would be affected.

d) A partial version of c) where recipes can set SSTATEPV to a function
if they need it. Solves your specific case with overhead without
affecting everyone else. Would not solve the issue generally without
manual user intervention.

I'm not sure which of these will end up making the most sense. These
assume the output hash code uses the sstate filename and not PV. If it
uses PV there would be more work needed.

Cheers,

Richard
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#185591): 
https://lists.openembedded.org/g/openembedded-core/message/185591
Mute This Topic: https://lists.openembedded.org/mt/100586415/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to