#18055: Improve lookahead in IntegerListsLex
---------------------------------+------------------------
       Reporter:  nthiery        |        Owner:
           Type:  enhancement    |       Status:  new
       Priority:  major          |    Milestone:  sage-6.6
      Component:  combinatorics  |   Resolution:
       Keywords:                 |    Merged in:
        Authors:                 |    Reviewers:
Report Upstream:  N/A            |  Work issues:
         Branch:                 |       Commit:
   Dependencies:  #18109         |     Stopgaps:
---------------------------------+------------------------
Description changed by nthiery:

Old description:

> `IntegerListsLex` uses a depth first search in a prefix tree, with
> lookahead to prune branches.
> The purpose of this ticket is to improve the lookahead algorithm in two
> directions:
>
> - Detect more branches that can be cut.
> - Better time complexity; this probably will involve identifying
> important invariants and caching them.
>
> Example:: even just for n=1000 this is taking a long time:
> {{{
>     sage: n = 100
>     sage: IntegerListsLex(binomial(n,2)-1, length=n, min_slope=1).list()
> }}}
> The following also take a long time, which should be optimized:
> {{{
>     sage: IntegerListsLex(10^100, max_length=1).list()
>     sage: IntegerListsLex(499499, length=1000, min_slope=1).list()
> }}}
>
> One could hope that one could reach an
> average complexity roughly linear in `O(log(max_n), max_length)` per
> list generated.
>
> Concrete steps:
>
> - Extend `Envelope` with a function:
>   {{{
>       def area(k):
>           """Return the area under the envelope between `0` and `k`"""
>   }}}
>
>   This functions should handle the "reverse smoothing" from position
>   `k` (so that the envelope satisfies both `min_slope` and `max_slope`
>   conditions between `1` and `k`), and have appropriate algorithmic
>   and caching so that the overall complexity of computing all
>   `area(k)` for `k<=l` is linear in `l`.
>
> - Replace:
>   {{{
>       def _possible_m(self, m, ...):
>   }}}
>   by:
>   {{{
>       def lookahead(self):
>   }}}
>
>   which shall detect whether `_current_list` has a chance to be a
>   prefix of some valid list. Using the above `area` method of the
>   envelopes, it should be possible to detect most dead branches.
>   Using dichotomy on the area / parts, it should be possible to
>   achieve good complexity in most cases.
>
>   This method should support `_current_list=[]` as well.
>
>   This method should store as much information as relevant to avoid
>   later recomputation for lower nodes in the tree. A minima, this
>   could possibly return/store a range for the next value, removing the
>   need for a separate `_m_interval` method.
>
> - Update `next` accordingly (the logic should be slightly simpler
>   thanks to the handling of the empty prefix).
>
> - Remove `integer_list_old` at the final stage.

New description:

 `IntegerListsLex` uses a depth first search in a prefix tree, with
 lookahead to prune branches.
 The purpose of this ticket is to improve the lookahead algorithm in two
 directions:

 - Detect more (all?) branches that can be cut.
 - Better time complexity; this probably will involve identifying important
 invariants and caching them.

 Example:: even just for n=1000 this is taking a long time:
 {{{
     sage: n = 100
     sage: IntegerListsLex(binomial(n,2)-1, length=n, min_slope=1).list()
 }}}
 The following also take a long time, which should be optimized:
 {{{
     sage: IntegerListsLex(10^100, max_length=1).list()
     sage: IntegerListsLex(499499, length=1000, min_slope=1).list()
 }}}

 One could hope that one could reach an average complexity roughly
 linear in `O(log(max_n), max_length)` per list generated.

 Concrete steps:

 - If we want the floor function to satisfy the `max_slope` constraint,
   we need to do "reverse smoothing", and the floor function then
   depends on the interval `[0,...,k)` over which we want to consider
   it. The same holds for the ceiling function if we want it to satisfy
   the `min_slope` constraint. Let thus `f_k` and `c_k` be respectively
   the floor and ceiling functions for the interval `[0,...k)`

   Let's ignore the constraints on the length and on `n` for now.
   Then, `f_k` and `c_k` are valid lists. Prove the following remark:

   For any `n` between `sum(f_k)` and `sum(c_k)` there exists a valid
   list of length `k` and sum `n`.

   Putting back the length and sum constraints, there exists a valid
   list if and only if the interval `I_k = [sum(f_k), sum(c_k)]`
   intersects the interval `[min_n, max_n]` for some `k` between
   `min_length` and `max_length`.

 - Extend `Envelope` with a method `.sum(k)` which computes `sum(f_k)`.

   It should have appropriate algorithmic and caching so that the
   overall time complexity of computing `sum(f_k)` for all `k<=l` is
   linear in `l`.

   Trick: store, for each `j`, the smallest `i<=j` such that between
   `i` and `j` the slope is exactly `max_slope`. The point is that
   computing the sum of the parts of `f_k` between `i` and `j` is
   constant time. Use that incrementally to derive quickly `sum(f_k)`
   from `sum(f_j)` for some `j<=k`.

   Suggestion for a faster development cycle: experiment with this by
   extracting the current code for `Envelope` in a separate file.

 - Use the above characterization and the `.sum(k)` methods to
   implement a method `IntegerListsLex.is_empty` which terminates with
   guaranteed result on non degenerate cases. This is essentially what
   the current `lookahead` method does, but starting from `0` and with
   guaranteed results thanks to the reverse smoothing.

   Determine precisely the degenerate cases where `is_empty` may take
   an arbitrarily long time (something like: the ceiling can be `0` for
   an arbitrary long time, without a known `limit=0` and `limit_start`).

   Decide on appropriate metrics to bound tightly the complexity in the
   other cases (something around `max_length`, `max_n`, `limit_start`).

   As a side product, we probably want to store for later usage the
   information of which `n` admit a valid list, as a collection of non
   overlapping intervals. Then, deciding whether there exists a valid
   list of sum `n` for a given `n` can be achieved efficiently by
   dichotomy.

 - Assuming that `l` is a prefix of a valid list, find out how to
   determine for which `i` `l+[i]` is still a prefix of a valid list.

   Potential building block: the choice of a prefix imposes more
   constraints on the floor and ceiling later on (our former `adapt`
   method), and we want to keep track of that. Given a prefix `l`,
   write `f_k(l)` be the list obtained by extending `l` to a list of
   length `k` using the floor `f_k`, with appropriate forward
   smoothing. Same for `c_k(l)`. Write `I_k(l) = [sum(f_k(l)),
   sum(c_k(l))]`.

   Now, `l` is a prefix of a valid list if one of the `I_k(l)`
   intersect `[min_sum, max_sum]`. So, while increasing the length of
   `l` we want to do something like incrementally adapting the sequence
   of intervals `I_k(l)` to always make sure we stay within one of
   them.

 - Use this to reimplement `lookahead(self)` implementing the above,
   with a guaranteed result and good complexity. And if at all possible
   a constant time complexity when the constraints are simple
   (e.g. `floor=0`, ...).

   This method should support `_current_list=[]` as well.

 - Update `next` accordingly (the logic should be slightly simpler
   thanks to the handling of the empty prefix).

 - If not implemented elsewhere: let the constructor accept a prefix as
   input, to start the iteration from there. Derive a `next` method as
   a side effect.

 - Remove `integer_list_old` at the final stage.

--

--
Ticket URL: <http://trac.sagemath.org/ticket/18055#comment:10>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to