#15820: Improve efficiency of the path semigroup
--------------------------------------------------+------------------------
Reporter: SimonKing | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: sage-6.2
Component: algebra | Resolution:
Keywords: quiver path algebra efficiency | Merged in:
Authors: | Reviewers:
Report Upstream: N/A | Work issues:
Branch: | Commit:
Dependencies: #12630 | Stopgaps:
--------------------------------------------------+------------------------
Comment (by SimonKing):
There are various possible more or less obvious data formats:
1. Store a path as `char*` or `unsigned int*`, hence, as a list of
integers, each integer corresponding to one edge of the quiver. `char*`
would bound the total number of edges of the quiver to 256 (which might be
too small), whereas `unsigned int*` might consume to much memory.
2. Store a path as `char*`, each integer corresponding to an outgoing edge
of a vertex. Hence, the total number of edges may be arbitrary, we only
have a local bound of 128 on the number of outgoing edges---this hopefully
is enough (it certainly is enough for me).
3. As in 1., but the representation is compressed: If we have less than
128 edges, then one byte can represent a pair of edges, and a single 32-
or 64-bit int could store even longer sequences of edges.
4. As in 2., but the representation is compressed: If each vertex has less
than 8 outgoing edges, then a byte can represent a pair and a single
32-bit word can represent a 10-tuple of edges (if I did not miscalculate).
In 2. and 4. (and probably in 1. and 3. as well) we should explicitly
store the start and end point of the path. The length should probably be
stored as well.
In my applications, the following operations on paths are needed to be
fast: Concatenation, hash/comparison, testing whether a path p starts with
a path q, and less important whether a path p contains a path q.
Concatenation is particularly easy in 1. and 2. (just concatenation of
lists, which is achieved by memcopy). It is a bit more difficult in 3. and
4., where the bytes of the second factor need to be shifted to get the
compressed data seamless.
Hash and comparison are easy in all four formats, but of course best speed
would be obtained in 4., simply since the amount of data that needs to be
processed is smaller than in 1.-3.
In all four proposed data formats, it is easy to test whether path p
starts with path q. Again, having the data stored in as little memory as
possible will speed things up.
It is easy in 1. to test whether path p ''contains'' path q (as a subpath
that is not necessarily an initial segment), whereas this becomes a bit
more difficult in 2.-4.
What format would you suggest? Perhaps it makes sense to have some initial
implementations of all four, and then compare directly.
--
Ticket URL: <http://trac.sagemath.org/ticket/15820#comment:1>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.