It's become clear that our approach to evaluating
packaging dependencies and constraints needs to become
a lot more sophisticated; we've been trying to make
packaging metadata more accurately reflect the way
we build and test packages in experiments and
the current solver is not very powerful.
A significant part of the difficulty is dealing with
externally produced packages; if a variety of versions
are available we may need to iteratively test multiple
versions, evaluating their dependencies to find one that
is compatible w/ the constraints that may be active on
the current image. This may involve extensive back-tracking
and rejection of previously made decisions. Such changes
would significantly complicate the current code, and produce
even longer run times during package plan creation...
One method of doing this sort of automated decision making
is to cast the problem as a series of boolean expressions,
and then apply a SAT solver to find a solution. These notes
describe the results of my experiments w/ the minisat solver
Stephen posted last month. The folks working on libzypp
for OpenSuse are also using a SAT solver to compute their
package solutions, although their mechanisms are quite
different.
Notes:
--------
1) The presence of a particular package version is a
single boolean variable; True if it's present,
False if not.
The problem set handed to the SAT solver is a series
of clauses; each clause are boolean variables (or
their negation) or'd together. All clauses must be
true for the solution to exist.
The clauses need to encode the fact that only one version
of a package may be installed at a time, and also encode
all differnt package dependencies and constraints.
2) Each package has some number of versions, inherently ordered.
Only one version of a package may be installed at a time
pkg a -> a.1, a.2, a.3, a.4
pkg b -> b.1, b.2, b.3, b.4
Thus for "a":
!a.1 | !a.2
!a.1 | !a.3
!a.1 | !a.4
!a.2 | !a.3
!a.2 | !a.4
!a.3 | !a.4
where !a represents the negation of a.
This means that for N versions, we have N*(N-1)/2 clauses;
pruning older non-accessible versions will be required to
bound memory consumption. In the case of package installation
rather than upgrade, we may use the installed incorporations
to radically prune the fmri set prior to automated solution.
3) Each version of a package may have dependencies on other
packages, either w/ or w/o a version. The version specification
will likely not be fully specified (eg multiple versions
may satisfy this requirement).
4) dependencies may be of the following types:
required: fmri specifies minimum acceptable version
if a.1 requires b.2, b.3 or b.4:
!a.1 | b.2 | b.3 | b.4
optional: if present, fmri must be at this level or greater
if a.1 optionally requires b.3:
!a.1 | !b.1
!a.1 | !b.2
incorporate: if present, pkg must match fmri
if a.1 incorporates b.3:
!a.1 | !b.1
!a.1 | !b.2
!a.1 | !b.4
All of these are linear in the number of package versions
either meeting or failing to meet the dependency.
5) To express state, the presence of a package is encoded as a
clause. We compute the matching fmris and then construct
a clause that matches one of those fmris. Specifying a single
version requires that version to be present in the solution;
we can also specify current version or newer, or any version of
a package. In this way we can use the same solver for "pkg install",
pkg "image-update" and other flavors such as ways of upgrading
to a specified version.
6) The SAT solver will find a particular solution to our packaging
problem, but there is no way of "preferring" newer packages, and
preventing the introduction of extraneous unneeded packages since
all clauses and variables are strictly Boolean.
As a result, external optimization in the form of repeated
solution attempts w/ additional constraints is necessary. A
simplistic approach is to compute an optimality criterion and
evaluate all solutions and pick the most optimal; this fails in
the face of massive numbers of possible solutions. A somewhat
more clever approach is needed.
The following algorithm has been implemented:
The packaging problem to be solved is expressed as a series of
boolean constraints, and a solution obtained. Then, for each
fmri appearing in the solution vector, all older versions are
excluded; in other words, if a.3 is part of the solution, then
subsequent solutions will not contain a.1 or a.2. Then a single
vector is added that is the negation of the just found vector,
and another solution is found. For example:
if solution is a.2, b.3, z.10, we add
# first negations of older versions
!a.1
!b.1
!b.2
!z.1
!z.2
...
!z.9
# now negate just found solution
!a.2 | !b.3 | !z.10
The latter vector requires that the new solution not contain
a.2 and b.3 and z.10; since we've excluded older versions we
will either get a vector that eliminates one of the packages
as unneeded (if dependencies allow) or one that has newer
versions of one of the needed packags.
We repeat the above process until a solution cannot be found;
the last found solution must therefore be the most optimal one.
The above technique may fail to find the overall optimal
solution if newer packages have incorporation dependencies
on earlier versions of their dependencies. This is expected
to be rare. Pruning the solution space to eliminate older
packages is necessary due to rapid solution space growth if
there are multiple versions that satisfy dependencies; we
may adopting a more partial pruning scheme if such "holes"
become an issue in the future.
7) Next on the list are finding ways of reducing the time needed
to construct the Boolean clauses. Solution time is very
short - typically a fraction of a second - but loading the
problem w/ the current naive implementation is taking
20 to 40 seconds to load the 25K+ pkg versions currently
in the dev repo. Pruning the solution space is part of the
answer, as is improving the efficiency of finding matching
packages. I hope to have some sample code out soon to enable
others to experiment with this.
8) Constructing user-friendly error messages may be difficult;
mapping the inability to find a solution to a problem w/
25K variables and 1M clauses into a concise error message
is a bit daunting at first.
- Bart
--
Bart Smaalders Solaris Kernel Performance
[email protected] http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss