Johan, I certainly do not qualify as a Sage veteran, but I have done some work on the engineering side of coding theory (LDPC codes).
Your framework sounds good to me. In fact, it's somewhat similar to what I have done. In my case, a particular code is relatively uninteresting, so I have been using a parity check matrix or adjacency graph directly (no special classes). Instead, I am working more with ensembles of codes, e.g. the set of all linear codes with n=1024, k=512 (usually I work with much smaller sets). An ensemble can be interesting analytically by itself; I also often generate a random code from the ensemble -- which could be an instance of one of your classes. I will eventually add algorithms to design optimized codes, e.g. with low error floors. Similar to your proposal, I also have a distinct class for decoding -- and I may add more. I have a few general comments. - The class of codes I am working with (LDPC codes) has an efficient decoding algorithm (belief propagation, O(n)), but encoding is O(n^2) in general. And computing the generator matrix for encoding is non- trivial as well. There are also sometimes multiple encoding algorithms. So in this case, at least, the complexity of encoding and decoding are reversed. - The received data are not in the same field for "soft-decision" decoding. This is very important for my applications. - You might want to use more descriptive terms for the block length and dimension of the code instead of just n and k. - I like the duck-typing approach instead of inheritance. That will extend better to corner cases like mine. - Do be careful with the equality operator. Sometimes it is assumed that "==" means the same Python class as well as the same "data". Since a code is *defined* by its mapping, I think I would support "==" meaning code equivalence without respect to Python class, e.g. the example you gave. Someone can always test for class equality if desired. - I usually think of data strings as vectors over the base field, but it's not hard to convert back and forth between lists and vectors. I think a list is a great way to store the data. I am happy to provide more feedback on more detailed proposal and/or prototype. I am also happy to review code (time permitting :-). Good luck! - Ryan On Aug 2, 7:13 am, "Johan S. R. Nielsen" <[email protected]> wrote: > As part of my newly begun Ph.D studies, I am planning to extend Sage's > functionality in error correcting codes. I have already had some > conversations with David Joyner about this, and we agreed that before > any major work should be undertaken, a general framework of the > classes and functionality should be discussed here, as the form of the > current code is not very scalable. Once we agree enough about > someting, this discussion could spawn a number of more specific tracs > for the different areas of improvement as well as the conforming of > the current functionality to the decided framework. I apologise that > already my initial proposal is rather lengthy, but it is difficult to > be more concise without losing precision. > > I am sure that this touches upon many general ideologic issues already > thoroughly discussed and decided upon long ago, so I hope that many > Sage veterans will join this discussion, and not only coding > theorists. > > Let's begin by shortly explaining the field: error-correcting codes is > about encoding a message into a longer string (a codeword), which is > to be sent over some non-perfect communication channel that might > corrupt a few of the ciphers of the stream; the point being that a > receiver can correct the faulty ciphers of the original message by > using the redundancy in the encoded stream, and from this retrieve the > original message. We usually represent the strings of data as vectors > over some finite field F, so the encoding is an injective function > from F^k -> F^n, where n > k. Decoding is a bit more complex: usually, > we wish to decode to the most likely candidate, assuming random > errors. Whenever there are few errors, this leads to one possibility, > but when there are many errors, it might be a list of possibilities > (list decoding). This also reflects the fact that encoding is usually > straightforward and cheap (just one algorithm), while decoding is > complex and expensive (and several algorithms could be applicable). > > Most codes studied are linear codes, meaning that the codewords of the > code is a linear subspace of F^n. Many of their properties are studied > by looking directly at the basis matrix of this subspace, called the > generator matrix of the code. There are numerous identified > subfamilies of linear codes, many of which overlap in various > important cases. > > On top of this, it would be nice if the framework supported generic > various testing of the efficiency, speed, error-correcting capability > etc. of a code and encoding/decoding algorithms. > > There are other concerns to be sure, but my initial suggestion for > structuring this is as follows: > > A family of codes is represented by a class (e.g. LinearCode) with an > appropriate constructor taking needed arguments (e.g. a generator > matrix for linear codes). An instance of such a class defines all that > is needed for defining (i.e. encoding) the specific member of the code > family in question. It has various generic functions that all codes > must have (e.g. k and n in the above notation, base_ring for the > alphabet). It might have functions for calculating various properties > of the code (e.g. minimum_distance or weight enumerator). It has an > encode function taking a list of length k over the base ring, which > will return the corresponding codeword. > Such a class should also have a decode function implementing the > fastest "standard" decoding algorithm (e.g. for Reed-Solomon codes, > this could be the Berlekamp-Massey decoder). If more than one decoding > algorithm has been implemented, it could be specified as an optional > second argument which one to use. > > As mentioned, one particular family of codes might be completely > contained in another family while partly overlapping with several > others (for example, a Reed-Solomon code is always a linear code but > only sometimes cyclic). This could be solved by multiple inheritance > as well as a code class for each possible intersection between any two > families of codes. However, even though they are nice to have handy, > most of the time a sub-family does not need the specificities of its > super-family (e.g. when working with a Reed-Solomon code, it is only > rarely interesting to know its generator matrix), so time should not > be wasted on computing these in all cases -- something which is hard > to avoid and predict when using inheritance. > Instead, I propose using a combination of direct conversion functions > as well as the "Quack like a duck" type-ideology of Python. For > example, the ReedSolomonCode class should not inherit from LinearCode, > but has a function to_linear_code, which constructs and returns the > appropriate LinearCode instance. Furthermore, for convenience, the > ReedSolomonCode itself implements most of the functions usually > queried (e.g. minimum_distance, which, coincidentally, is easily > calculated for this Reed-Solomon codes), either by a sub-family > specific reimplementation or by constructing the LinearCode in the > background (and saving it for future queries) and forwarding the > function call. This last part should be to such an extend that most > functions expecting a LinearCode can take an instance of > ReedSolomonCode instead -- e.g. a generic testing framework. This also > rather elegantly extends to the case of overlapping families: a > ReedSolomonCode has the functions is_cyclic and to_cyclic_code, where > the latter raises an exception in the case that the particular Reed- > Solomon code is not cyclic. > As a last remark, this suggestion could also be extended to support > some form of coercion between the classes, such that e.g. > {{{ > R = ReedSolomonCode(<appropriate constructor information>) > R == R.to_linear_code() > True}}} > > However, this also raises the question of equivalence between codes > and should perhaps be deferred to a later discussion. > > So a code should be a class and a specific code an instance of one > such. However, for simplicity -- and because more generality seems > unnecessary --, I would keep the data strings as Python lists over the > base ring (with no specific encapsulating classes). So some code's > encode-function should take a list of the appropriate length and over > the code's base_ring. This would probably call for a module for > manipulating these lists as codewords (e.g. when decoding Reed-Solomon > codes, one often goes back and forth between a list representation and > a polynomial representation of the codewords) > > Continuing to the decoding concerns. The protocol for a decoding > algorithm should be: return the one result in case of a unique- > decoding algorithm; raise an exception (DecodingFailed) in case of > total failure (possibly containing a best guess (usable e.g. in the > case of systematic encoding) or other decoding-algorithm-specific > information); and return a list of results in the case of successful > list decoding. A decoder that might sometimes return a list should > always do so for type safety, and a decoding algorithm should have a > function is_list_decoder returning whether or not this is the case. > > Many decoders have to be initialised with a lot of precomputation. A > different concern is that the same decoding algorithm might work for > several families of codes (without one family being directly contained > in another), or there might be variants of the algorithm for different > families of codes which are so related that they could share a lot of > code. One way to handle all of this -- keeping the above suggestion > for code family representation in mind -- is for decoding algorithms > to be implemented as classes themselves, which are then instantiated > whenever some code's decode function is called. This also puts a nice > structure on the source code (for example, the Guruswami-Sudan > algorithm for both Reed-Solomon codes and general AG-codes can both > reside in decoders.GuruswamiSudan if that proved favorable). The > decode function of some code class would then simply be to instantiate > the appropriate decoding algorithm -- if it had not already been > instantiated and saved -- and use this object's decode function. > > I hope this conveys my idea on the framework. I look forward to your > thoughts. > Cheers, > Johan -- To post to this group, send an email to [email protected] To unsubscribe from this group, send an email to [email protected] For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org
