#15820: Improve efficiency of the path semigroup
--------------------------------------------------+------------------------
       Reporter:  SimonKing                       |        Owner:
           Type:  enhancement                     |       Status:  new
       Priority:  major                           |    Milestone:  sage-6.2
      Component:  algebra                         |   Resolution:
       Keywords:  quiver path algebra efficiency  |    Merged in:
        Authors:                                  |    Reviewers:
Report Upstream:  N/A                             |  Work issues:
         Branch:                                  |       Commit:
   Dependencies:  #12630                          |     Stopgaps:
--------------------------------------------------+------------------------

Comment (by SimonKing):

 There are various possible more or less obvious data formats:

 1. Store a path as `char*` or `unsigned int*`, hence, as a list of
 integers, each integer corresponding to one edge of the quiver. `char*`
 would bound the total number of edges of the quiver to 256 (which might be
 too small), whereas `unsigned int*` might consume to much memory.
 2. Store a path as `char*`, each integer corresponding to an outgoing edge
 of a vertex. Hence, the total number of edges may be arbitrary, we only
 have a local bound of 128 on the number of outgoing edges---this hopefully
 is enough (it certainly is enough for me).
 3. As in 1., but the representation is compressed: If we have less than
 128 edges, then one byte can represent a pair of edges, and a single 32-
 or 64-bit int could store even longer sequences of edges.
 4. As in 2., but the representation is compressed: If each vertex has less
 than 8 outgoing edges, then a byte can represent a pair and a single
 32-bit word can represent a 10-tuple of edges (if I did not miscalculate).

 In 2. and 4. (and probably in 1. and 3. as well) we should explicitly
 store the start and end point of the path. The length should probably be
 stored as well.

 In my applications, the following operations on paths are needed to be
 fast: Concatenation, hash/comparison, testing whether a path p starts with
 a path q, and less important whether a path p contains a path q.

 Concatenation is particularly easy in 1. and 2. (just concatenation of
 lists, which is achieved by memcopy). It is a bit more difficult in 3. and
 4., where the bytes of the second factor need to be shifted to get the
 compressed data seamless.

 Hash and comparison are easy in all four formats, but of course best speed
 would be obtained in 4., simply since the amount of data that needs to be
 processed is smaller than in 1.-3.

 In all four proposed data formats, it is easy to test whether path p
 starts with path q. Again, having the data stored in as little memory as
 possible will speed things up.

 It is easy in 1. to test whether path p ''contains'' path q (as a subpath
 that is not necessarily an initial segment), whereas this becomes a bit
 more difficult in 2.-4.

 What format would you suggest? Perhaps it makes sense to have some initial
 implementations of all four, and then compare directly.

--
Ticket URL: <http://trac.sagemath.org/ticket/15820#comment:1>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to