On 4/22/20 5:11 AM, Richard Purdie wrote:
On Tue, 2020-04-21 at 21:51 -0500, Joshua Watt wrote:
On Sun, Apr 19, 2020 at 12:04 PM Richard Purdie
<[email protected]> wrote:
Continuing the discussion from the OE TSC meeting, now that I've had
a little time to digest the idea: I don't think the filter functions
are necessarily a bad idea. I think the real problem is how to
prevent them from becoming API maintenance and debugging problems,
and as you stated, performance issues. I think there might be a few
ways to address this, and most of it comes down to strictly defining
the API.
Firstly, I don't think we want to allow (at least at this point)
completely arbitrary python functions to be called as a filter.
Hopefully, we can figure out the few filter manipulations that are
required and design the API such that those can be used without
having to open it all the way up.
I think that is a sensible idea.

The good news is we know roughly what the current needs look like, the
code is in lib/oe/classextend.py:

     def extend_name(self, name):
         if name.startswith("kernel-") or name == "virtual/kernel":
             return name
         if name.startswith("rtld"):
             return name
         if name.endswith("-crosssdk"):
             return name
         if name.endswith("-" + self.extname):
             name = name.replace("-" + self.extname, "")
         if name.startswith("virtual/"):
             subs = name.split("/", 1)[1]
             if not subs.startswith(self.extname):
                 return "virtual/" + self.extname + "-" + subs
             return name
         if name.startswith("/") or (name.startswith("${") and 
name.endswith("}")):
             return name
         if not name.startswith(self.extname):
             return self.extname + "-" + name
         return name

     def map_depends(self, dep):
         if dep.endswith(("-native", "-native-runtime")) or ('nativesdk-' in 
dep) or ('cross-canadian' in dep) or ('-crosssdk-' in dep):
             return dep
         else:
             # Do not extend for that already have multilib prefix
             var = self.d.getVar("MULTILIB_VARIANTS")
             if var:
                 var = var.split()
                 for v in var:
                     if dep.startswith(v):
                         return dep
             return self.extend_name(dep)

which apart from one getVar call, is all static string manipulation.
Could be better, could be worse. Its surprisingly hard to manipulate a
depends string (we also have to use the explode_deps function as the
strings can have versions in).

  Secondly, I think we should assume that any
filter function is "pure" [1], meaning it only takes in the input
variable value string, possibly some other manipulation arguments,
and outputs the new variable value string. Notable, it should *not*
take in the datastore object, since this is inherently global data
that can change at any time. Hopefully, this will allow optimizations
and prevent performance issues because bitbake can keep a cache of
filter function inputs mapped to outputs. Since the function outputs
*only* depend on the inputs, the filter can avoid being called
multiple times if the input set has been seen before.
That definitely looks like a good move and alleviates some of the
concerns I have about it.

I do wonder how we handle MULTILIB_VARIANTS in the above example.
Perhaps some kind of cut down key/value pairs or allow parameters?

I can dream up a few ways this might look. In this example, I'm
defining a filter function that simply adds a defined prefix to every
value in a space separated variable.

First, the filter function itself:

  def filter_prefix(in, prefix):
      return ' '.join(prefix + v for v in in.split())

As for how to express this in a recipe, I can think of a few ways.
This first one is a variable flag. The base flag is "filter" and the
suffix indicates which filter is applied (in this case, the "prefix"
function). The value are the extra arguments passed to the function.

  DEPENDS[filter:prefix] = "${MLPREFIX}"

Another option would be to encode the function and arguments in the
flag value, which might also allow filters to be chained (although, I
would hope that's a rare requirement!):

  DEPENDS[filter] = "prefix ${MLPREFIX}"

Chaining might look something like:

  DEPENDS[filter] = "prefix ${MLPREFIX}; suffix ${MY_SUFFIX}"

The great part about pure functions here is that the property is
transitive, so the entire result of chain with both operations (e.g.
suffix(prefix(${DEPENDS}))) can be cached if desired.
Those are good suggestions. Thinking out loud, I'm kind of drawn to
something which looks a bit more function like:

DEPENDS[filter] = "prefix_filter('${MLPREFIX}') suffix_filter('${MY_SUFFIX}')"

I was trying to avoid anything too complex to parse.... but I wonder if we can 
enlist the help of the python interpreter. If we have an appropriate delimiter, 
something like this might work:

 def apply_filter(value, filter_expression):
     class Val:
         def __init__(self, value):
             self.value = value
def prefix(self, prefix):
             self.value = ' '.join(prefix + v for v in self.__value.split())
def suffix(self, suffix):
             self.value = ' '.join(v + suffix for v in self.__value.split())

     v = Val(value)
     for e in filter_expression.split(';'):
         e = e.strip()
         if not e:
            continue
         eval(e, globals={}, locals={'v': v})
     return v.value

Obviously, you might want something a little more dynamic for registering the 
actual filter functions into the class, but the important part is that the 
eval() is pretty restrictive in what filter functions are allowed to access or 
do.

This would allow the user to write a fairly natural expression like:

 DEPENDS[filter] = "v.prefix('${MLPREFIX}'); v.suffix('${MY_SUFFIX}')

but, things like this also still work (since the code ignores empty 
expressions):

 DEPEND[filter] += "; v.prefix('foo')"

The hardest part is finding a delimiter that isn't likely to show up in the 
actual variable values, since the string has to be split and duplicates ignored 
before it is parsed by the python code.



as it might be more obvious and searchable through grep? I don't like
";" as a delimiter as it doesn't work well with append/prepend of
values. Its a tradeoff between the need for some parameters to be
passed in and complexity.

I do like the fact we can cache this approach, that will help a lot
with performance.

There's probably a few other ways to express this, but the idea is
that it's more tightly controlled about what can be applied.

[1]: https://en.wikipedia.org/wiki/Pure_function
Its definitely the way to go...

Cheers,

Richard



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1061): 
https://lists.openembedded.org/g/openembedded-architecture/message/1061
Mute This Topic: https://lists.openembedded.org/mt/73131657/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-architecture/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to