#18055: Improve lookahead in IntegerListsLex
---------------------------------+------------------------
Reporter: nthiery | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: sage-6.6
Component: combinatorics | Resolution:
Keywords: | Merged in:
Authors: | Reviewers:
Report Upstream: N/A | Work issues:
Branch: | Commit:
Dependencies: #18109 | Stopgaps:
---------------------------------+------------------------
Description changed by nthiery:
Old description:
> `IntegerListsLex` uses a depth first search in a prefix tree, with
> lookahead to prune branches.
> The purpose of this ticket is to improve the lookahead algorithm in two
> directions:
>
> - Detect more branches that can be cut.
> - Better time complexity; this probably will involve identifying
> important invariants and caching them.
>
> Example:: even just for n=1000 this is taking a long time:
> {{{
> sage: n = 100
> sage: IntegerListsLex(binomial(n,2)-1, length=n, min_slope=1).list()
> }}}
> The following also take a long time, which should be optimized:
> {{{
> sage: IntegerListsLex(10^100, max_length=1).list()
> sage: IntegerListsLex(499499, length=1000, min_slope=1).list()
> }}}
>
> One could hope that one could reach an
> average complexity roughly linear in `O(log(max_n), max_length)` per
> list generated.
>
> Concrete steps:
>
> - Extend `Envelope` with a function:
> {{{
> def area(k):
> """Return the area under the envelope between `0` and `k`"""
> }}}
>
> This functions should handle the "reverse smoothing" from position
> `k` (so that the envelope satisfies both `min_slope` and `max_slope`
> conditions between `1` and `k`), and have appropriate algorithmic
> and caching so that the overall complexity of computing all
> `area(k)` for `k<=l` is linear in `l`.
>
> - Replace:
> {{{
> def _possible_m(self, m, ...):
> }}}
> by:
> {{{
> def lookahead(self):
> }}}
>
> which shall detect whether `_current_list` has a chance to be a
> prefix of some valid list. Using the above `area` method of the
> envelopes, it should be possible to detect most dead branches.
> Using dichotomy on the area / parts, it should be possible to
> achieve good complexity in most cases.
>
> This method should support `_current_list=[]` as well.
>
> This method should store as much information as relevant to avoid
> later recomputation for lower nodes in the tree. A minima, this
> could possibly return/store a range for the next value, removing the
> need for a separate `_m_interval` method.
>
> - Update `next` accordingly (the logic should be slightly simpler
> thanks to the handling of the empty prefix).
>
> - Remove `integer_list_old` at the final stage.
New description:
`IntegerListsLex` uses a depth first search in a prefix tree, with
lookahead to prune branches.
The purpose of this ticket is to improve the lookahead algorithm in two
directions:
- Detect more (all?) branches that can be cut.
- Better time complexity; this probably will involve identifying important
invariants and caching them.
Example:: even just for n=1000 this is taking a long time:
{{{
sage: n = 100
sage: IntegerListsLex(binomial(n,2)-1, length=n, min_slope=1).list()
}}}
The following also take a long time, which should be optimized:
{{{
sage: IntegerListsLex(10^100, max_length=1).list()
sage: IntegerListsLex(499499, length=1000, min_slope=1).list()
}}}
One could hope that one could reach an average complexity roughly
linear in `O(log(max_n), max_length)` per list generated.
Concrete steps:
- If we want the floor function to satisfy the `max_slope` constraint,
we need to do "reverse smoothing", and the floor function then
depends on the interval `[0,...,k)` over which we want to consider
it. The same holds for the ceiling function if we want it to satisfy
the `min_slope` constraint. Let thus `f_k` and `c_k` be respectively
the floor and ceiling functions for the interval `[0,...k)`
Let's ignore the constraints on the length and on `n` for now.
Then, `f_k` and `c_k` are valid lists. Prove the following remark:
For any `n` between `sum(f_k)` and `sum(c_k)` there exists a valid
list of length `k` and sum `n`.
Putting back the length and sum constraints, there exists a valid
list if and only if the interval `I_k = [sum(f_k), sum(c_k)]`
intersects the interval `[min_n, max_n]` for some `k` between
`min_length` and `max_length`.
- Extend `Envelope` with a method `.sum(k)` which computes `sum(f_k)`.
It should have appropriate algorithmic and caching so that the
overall time complexity of computing `sum(f_k)` for all `k<=l` is
linear in `l`.
Trick: store, for each `j`, the smallest `i<=j` such that between
`i` and `j` the slope is exactly `max_slope`. The point is that
computing the sum of the parts of `f_k` between `i` and `j` is
constant time. Use that incrementally to derive quickly `sum(f_k)`
from `sum(f_j)` for some `j<=k`.
Suggestion for a faster development cycle: experiment with this by
extracting the current code for `Envelope` in a separate file.
- Use the above characterization and the `.sum(k)` methods to
implement a method `IntegerListsLex.is_empty` which terminates with
guaranteed result on non degenerate cases. This is essentially what
the current `lookahead` method does, but starting from `0` and with
guaranteed results thanks to the reverse smoothing.
Determine precisely the degenerate cases where `is_empty` may take
an arbitrarily long time (something like: the ceiling can be `0` for
an arbitrary long time, without a known `limit=0` and `limit_start`).
Decide on appropriate metrics to bound tightly the complexity in the
other cases (something around `max_length`, `max_n`, `limit_start`).
As a side product, we probably want to store for later usage the
information of which `n` admit a valid list, as a collection of non
overlapping intervals. Then, deciding whether there exists a valid
list of sum `n` for a given `n` can be achieved efficiently by
dichotomy.
- Assuming that `l` is a prefix of a valid list, find out how to
determine for which `i` `l+[i]` is still a prefix of a valid list.
Potential building block: the choice of a prefix imposes more
constraints on the floor and ceiling later on (our former `adapt`
method), and we want to keep track of that. Given a prefix `l`,
write `f_k(l)` be the list obtained by extending `l` to a list of
length `k` using the floor `f_k`, with appropriate forward
smoothing. Same for `c_k(l)`. Write `I_k(l) = [sum(f_k(l)),
sum(c_k(l))]`.
Now, `l` is a prefix of a valid list if one of the `I_k(l)`
intersect `[min_sum, max_sum]`. So, while increasing the length of
`l` we want to do something like incrementally adapting the sequence
of intervals `I_k(l)` to always make sure we stay within one of
them.
- Use this to reimplement `lookahead(self)` implementing the above,
with a guaranteed result and good complexity. And if at all possible
a constant time complexity when the constraints are simple
(e.g. `floor=0`, ...).
This method should support `_current_list=[]` as well.
- Update `next` accordingly (the logic should be slightly simpler
thanks to the handling of the empty prefix).
- If not implemented elsewhere: let the constructor accept a prefix as
input, to start the iteration from there. Derive a `next` method as
a side effect.
- Remove `integer_list_old` at the final stage.
--
--
Ticket URL: <http://trac.sagemath.org/ticket/18055#comment:10>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.