Re: [Openembedded-architecture] Concerns about multilib bugs

Quentin Schulz Tue, 28 Apr 2020 10:48:23 -0700

Hi all,

I still had a couple of ¢ to give so here they are.


On Wed, Apr 22, 2020 at 09:30:23AM -0500, Joshua Watt wrote:
> 
> On 4/22/20 5:11 AM, Richard Purdie wrote:
> > On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
> > > On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
> > > <[email protected]> wrote:
> > > Continuing the discussion from the OE TSC meeting, now that I've had
> > > a little time to digest the idea: I don't think the filter functions
> > > are necessarily a bad idea. I think the real problem is how to
> > > prevent them from becoming API maintenance and debugging problems,
> > > and as you stated, performance issues. I think there might be a few
> > > ways to address this, and most of it comes down to strictly defining
> > > the API.
> > > Firstly, I don't think we want to allow (at least at this point)
> > > completely arbitrary python functions to be called as a filter.
> > > Hopefully, we can figure out the few filter manipulations that are
> > > required and design the API such that those can be used without
> > > having to open it all the way up.
> > I think that is a sensible idea.
> > 
> > The good news is we know roughly what the current needs look like, the
> > code is in lib/oe/classextend.py:
> > 
> >      def extend_name(self, name):
> >          if name.startswith("kernel-") or name == "virtual/kernel":
> >              return name
> >          if name.startswith("rtld"):
> >              return name
> >          if name.endswith("-crosssdk"):
> >              return name
> >          if name.endswith("-" + self.extname):
> >              name = name.replace("-" + self.extname, "")
> >          if name.startswith("virtual/"):
> >              subs = name.split("/", 1)[1]
> >              if not subs.startswith(self.extname):
> >                  return "virtual/" + self.extname + "-" + subs
> >              return name
> >          if name.startswith("/") or (name.startswith("${") and 
> > name.endswith("}")):
> >              return name
> >          if not name.startswith(self.extname):
> >              return self.extname + "-" + name
> >          return name
> > 
> >      def map_depends(self, dep):
> >          if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' 
> > in dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep):
> >              return dep
> >          else:
> >              # Do not extend for that already have multilib prefix
> >              var = self.d.getVar("MULTILIB_VARIANTS")
> >              if var:
> >                  var = var.split()
> >                  for v in var:
> >                      if dep.startswith(v):
> >                          return dep
> >              return self.extend_name(dep)
> > 
> > which apart from one getVar call, is all static string manipulation.
> > Could be better, could be worse. Its surprisingly hard to manipulate a
> > depends string (we also have to use the explode_deps function as the
> > strings can have versions in).
> > 
> > >   Secondly, I think we should assume that any
> > > filter function is "pure" [1], meaning it only takes in the input
> > > variable value string, possibly some other manipulation arguments,
> > > and outputs the new variable value string. Notable, it should *not*
> > > take in the datastore object, since this is inherently global data
> > > that can change at any time. Hopefully, this will allow optimizations
> > > and prevent performance issues because bitbake can keep a cache of
> > > filter function inputs mapped to outputs. Since the function outputs
> > > *only* depend on the inputs, the filter can avoid being called
> > > multiple times if the input set has been seen before.
> > That definitely looks like a good move and alleviates some of the
> > concerns I have about it.
> > 
> > I do wonder how we handle MULTILIB_VARIANTS in the above example.
> > Perhaps some kind of cut down key/value pairs or allow parameters?
> > 
> > > I can dream up a few ways this might look. In this example, I'm
> > > defining a filter function that simply adds a defined prefix to every
> > > value in a space separated variable.
> > > 
> > > First, the filter function itself:
> > > 
> > >   def filter_prefix(in, prefix):
> > >       return ' '.join(prefix + v for v in in.split())
> > > 
> > > As for how to express this in a recipe, I can think of a few ways.
> > > This first one is a variable flag. The base flag is "filter" and the
> > > suffix indicates which filter is applied (in this case, the "prefix"
> > > function). The value are the extra arguments passed to the function.
> > > 
> > >   DEPENDS[filter:prefix] = "${MLPREFIX}"
> > > 
> > > Another option would be to encode the function and arguments in the
> > > flag value, which might also allow filters to be chained (although, I
> > > would hope that's a rare requirement!):
> > > 
> > >   DEPENDS[filter] = "prefix ${MLPREFIX}"
> > > 
> > > Chaining might look something like:
> > > 
> > >   DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"
> > > 
> > > The great part about pure functions here is that the property is
> > > transitive, so the entire result of chain with both operations (e.g.
> > > suffix(prefix(${DEPENDS}))) can be cached if desired.
> > Those are good suggestions. Thinking out loud, I'm kind of drawn to
> > something which looks a bit more function like:
> > 
> > DEPENDS[filter] = "prefix_filter('${MLPREFIX}') 
> > suffix_filter('${MY_SUFFIX}')"
> 
> I was trying to avoid anything too complex to parse.... but I wonder if we 
> can enlist the help of the python interpreter. If we have an appropriate 
> delimiter, something like this might work:
> 
>  def apply_filter(value, filter_expression):
>      class Val:
>          def __init__(self, value):
>              self.value = value
>          def prefix(self, prefix):
>              self.value = ' '.join(prefix + v for v in self.__value.split())
>          def suffix(self, suffix):
>              self.value = ' '.join(v + suffix for v in self.__value.split())
> 
>      v = Val(value)
>      for e in filter_expression.split(';'):
>          e = e.strip()
>          if not e:
>             continue
>          eval(e, globals={}, locals={'v': v})
>      return v.value
> 
> Obviously, you might want something a little more dynamic for registering the 
> actual filter functions into the class, but the important part is that the 
> eval() is pretty restrictive in what filter functions are allowed to access 
> or do.
> 
> This would allow the user to write a fairly natural expression like:
> 
>  DEPENDS[filter] = "v.prefix('${MLPREFIX}'); v.suffix('${MY_SUFFIX}')
> 
> but, things like this also still work (since the code ignores empty 
> expressions):
> 
>  DEPEND[filter] += "; v.prefix('foo')"
> 
> The hardest part is finding a delimiter that isn't likely to show up in the 
> actual variable values, since the string has to be split and duplicates 
> ignored before it is parsed by the python code.
> 

I summoned the Google god and I found this stackoverflow on how to
execute multiple lines instead of the one eval is expecting:
https://stackoverflow.com/a/39381428
https://docs.python.org/3/library/ast.html

If I didn't completely misunderstood what was suggested, that should
eliminate the need for splitting. Though... I don't know how all of this
works wrt basic security against code injection (e.g.:
DEPENDS[filter] += "import shutil; shutil.rmtree('/')")? I barely know
Python :)

Just to try to grasp what's suggested, are all filter flags going to be
resolved/executed at one specific point in time or will the filter of
each variable be called explicitely when needed?

If the former, my simple question is: aren't we just adding one more
ordering issue somehow? I'm pretty sure someone will want one day to
have this filter flag expanded/resolved at a different time that we
would set it to.

For the latter, seems error prone but at least we have the ability to
fix a missing or misplaced filter rather easily.

Hopefully I haven't completely misunderstood the suggestion.

I'm afraid that's as far as I can participate to the issue from my eagle
view on this as I'm lacking knowledge in Python and bitbake :/

Let me know if there's anything I can do,
Thanks,
Quentin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1066): 
https://lists.openembedded.org/g/openembedded-architecture/message/1066
Mute This Topic: https://lists.openembedded.org/mt/73131657/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [Openembedded-architecture] Concerns about multilib bugs

Reply via email to