#19494: Make finite word method nb_subword_occurences_in much faster
-------------------------+-------------------------------------------------
Reporter: | Owner:
slabbe | Status: needs_info
Type: | Milestone: sage-6.10
enhancement | Resolution:
Priority: major | Merged in:
Component: | Reviewers: Vincent Delecroix
combinatorics | Work issues:
Keywords: | Commit:
Authors: | 664fd8e0650d7e9c95fab0d484ae5f800641b2a6
Sébastien Labbé | Stopgaps:
Report Upstream: N/A |
Branch: |
u/slabbe/19494 |
Dependencies: |
-------------------------+-------------------------------------------------
Comment (by slabbe):
That binomial was introduced by Jacques Sakarovitch as the theme of a
chapter in the (first?) Lothaire book. It generalizes the binomial since
`a^n.binomial(a^k)` is equal to `n.binomial(k)`. Ok, I agree that too many
methods is too many. I removed it in the last commit.
I added a new commit allowing easier reviewing and comparison of
implementations. I still believe we will need one more commit to clean
this up if finally we think that one implementation is useless or if one
argument is useless. We will see.
About using sparse matrices. A first remark is that matrices computed are
upper triangular. So using sparse will lead eventually to saving half the
space at least.
If the size of the alphabet is low, using sparse matrices is about 10
times slower:
{{{
sage: u = words.ThueMorseWord()[:1000]
sage: L = list(u)
sage:
sage: %time u[:20].nb_subword_occurrences_in(u, sparse=False)
CPU times: user 130 ms, sys: 1.37 ms, total: 131 ms
Wall time: 131 ms
347151814258896164694010210266438495
sage: %time u[:20].nb_subword_occurrences_in(u, sparse=True)
CPU times: user 979 ms, sys: 5.88 ms, total: 984 ms
Wall time: 986 ms
347151814258896164694010210266438495
}}}
If the size of alphabet is bigger (computation done with 201x201
matrices): then sparse or not seems about the same too:
{{{
sage: u = Word(range(200))
sage: %time u.nb_subword_occurrences_in(u*u*u, sparse=True)
CPU times: user 16.1 s, sys: 27.1 ms, total: 16.1 s
Wall time: 16.2 s
20301
sage: %time u.nb_subword_occurrences_in(u*u*u, sparse=False)
CPU times: user 16.6 s, sys: 169 ms, total: 16.7 s
Wall time: 16.7 s
20301
}}}
In comparison, here is the longest word (following the previous example
construction) the original algorithm can handle in less then 16s:
{{{
sage: u = Word(range(60))
sage: %time u.nb_subword_occurrences_in(u*u*u, algorithm='original')
CPU times: user 12.1 s, sys: 13 ms, total: 12.1 s
Wall time: 12.1 s
1891
}}}
I never found an example where the original implementation is better. So,
that's why I would suggest to just delete it. Or we like the idea of
having two distinct implementations.
--
Ticket URL: <http://trac.sagemath.org/ticket/19494#comment:5>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.