[sage-devel] Re: A scalable framework for error-correcting code functionality in Sage

Ryan Hinton Mon, 02 Aug 2010 08:33:46 -0700

Johan,

I certainly do not qualify as a Sage veteran, but I have done some
work on the engineering side of coding theory (LDPC codes).


Your framework sounds good to me.  In fact, it's somewhat similar to
what I have done.  In my case, a particular code is relatively
uninteresting, so I have been using a parity check matrix or adjacency
graph directly (no special classes).  Instead, I am working more with
ensembles of codes, e.g. the set of all linear codes with n=1024,
k=512 (usually I work with much smaller sets).  An ensemble can be
interesting analytically by itself; I also often generate a random
code from the ensemble -- which could be an instance of one of your
classes.  I will eventually add algorithms to design optimized codes,
e.g. with low error floors.

Similar to your proposal, I also have a distinct class for decoding --
and I may add more.

I have a few general comments.

-  The class of codes I am working with (LDPC codes) has an efficient
decoding algorithm (belief propagation, O(n)), but encoding is O(n^2)
in general.  And computing the generator matrix for encoding is non-
trivial as well.  There are also sometimes multiple encoding
algorithms.  So in this case, at least, the complexity of encoding and
decoding are reversed.

-  The received data are not in the same field for "soft-decision"
decoding.  This is very important for my applications.

-  You might want to use more descriptive terms for the block length
and dimension of the code instead of just n and k.

-  I like the duck-typing approach instead of inheritance.  That will
extend better to corner cases like mine.

-  Do be careful with the equality operator.  Sometimes it is assumed
that "==" means the same Python class as well as the same "data".
Since a code is *defined* by its mapping, I think I would support "=="
meaning code equivalence without respect to Python class, e.g. the
example you gave.  Someone can always test for class equality if
desired.

-  I usually think of data strings as vectors over the base field, but
it's not hard to convert back and forth between lists and vectors.  I
think a list is a great way to store the data.

I am happy to provide more feedback on more detailed proposal and/or
prototype.  I am also happy to review code (time permitting :-).

Good luck!

- Ryan

On Aug 2, 7:13 am, "Johan S. R. Nielsen" <[email protected]> wrote:
> As part of my newly begun Ph.D studies, I am planning to extend Sage's
> functionality in error correcting codes. I have already had some
> conversations with David Joyner about this, and we agreed that before
> any major work should be undertaken, a general framework of the
> classes and functionality should be discussed here, as the form of the
> current code is not very scalable. Once we agree enough about
> someting, this discussion could spawn a number of more specific tracs
> for the different areas of improvement as well as the conforming of
> the current functionality to the decided framework. I apologise that
> already my initial proposal is rather lengthy, but it is difficult to
> be more concise without losing precision.
>
> I am sure that this touches upon many general ideologic issues already
> thoroughly discussed and decided upon long ago, so I hope that many
> Sage veterans will join this discussion, and not only coding
> theorists.
>
> Let's begin by shortly explaining the field: error-correcting codes is
> about encoding a message into a longer string (a codeword), which is
> to be sent over some non-perfect communication channel that might
> corrupt a few of the ciphers of the stream; the point being that a
> receiver can correct the faulty ciphers of the original message by
> using the redundancy in the encoded stream, and from this retrieve the
> original message. We usually represent the strings of data as vectors
> over some finite field F, so the encoding is an injective function
> from F^k -> F^n, where n > k. Decoding is a bit more complex: usually,
> we wish to decode to the most likely candidate, assuming random
> errors. Whenever there are few errors, this leads to one possibility,
> but when there are many errors, it might be a list of possibilities
> (list decoding). This also reflects the fact that encoding is usually
> straightforward and cheap (just one algorithm), while decoding is
> complex and expensive (and several algorithms could be applicable).
>
> Most codes studied are linear codes, meaning that the codewords of the
> code is a linear subspace of F^n. Many of their properties are studied
> by looking directly at the basis matrix of this subspace, called the
> generator matrix of the code. There are numerous identified
> subfamilies of linear codes, many of which overlap in various
> important cases.
>
> On top of this, it would be nice if the framework supported generic
> various testing of the efficiency, speed, error-correcting capability
> etc. of a code and encoding/decoding algorithms.
>
> There are other concerns to be sure, but my initial suggestion for
> structuring this is as follows:
>
> A family of codes is represented by a class (e.g. LinearCode) with an
> appropriate constructor taking needed arguments (e.g. a generator
> matrix for linear codes). An instance of such a class defines all that
> is needed for defining (i.e. encoding) the specific member of the code
> family in question. It has various generic functions that all codes
> must have (e.g. k and n in the above notation, base_ring for the
> alphabet). It might have functions for calculating various properties
> of the code (e.g. minimum_distance or weight enumerator). It has an
> encode function taking a list of length k over the base ring, which
> will return the corresponding codeword.
> Such a class should also have a decode function implementing the
> fastest "standard" decoding algorithm (e.g. for Reed-Solomon codes,
> this could be the Berlekamp-Massey decoder). If more than one decoding
> algorithm has been implemented, it could be specified as an optional
> second argument which one to use.
>
> As mentioned, one particular family of codes might be completely
> contained in another family while partly overlapping with several
> others (for example, a Reed-Solomon code is always a linear code but
> only sometimes cyclic). This could be solved by multiple inheritance
> as well as a code class for each possible intersection between any two
> families of codes. However, even though they are nice to have handy,
> most of the time a sub-family does not need the specificities of its
> super-family (e.g. when working with a Reed-Solomon code, it is only
> rarely interesting to know its generator matrix), so time should not
> be wasted on computing these in all cases -- something which is hard
> to avoid and predict when using inheritance.
> Instead, I propose using a combination of direct conversion functions
> as well as the "Quack like a duck" type-ideology of Python. For
> example, the ReedSolomonCode class should not inherit from LinearCode,
> but has a function to_linear_code, which constructs and returns the
> appropriate LinearCode instance. Furthermore, for convenience, the
> ReedSolomonCode itself implements most of the functions usually
> queried (e.g. minimum_distance, which, coincidentally, is easily
> calculated for this Reed-Solomon codes), either by a sub-family
> specific reimplementation or by constructing the LinearCode in the
> background (and saving it for future queries) and forwarding the
> function call. This last part should be to such an extend that most
> functions expecting a LinearCode can take an instance of
> ReedSolomonCode instead -- e.g. a generic testing framework. This also
> rather elegantly extends to the case of overlapping families: a
> ReedSolomonCode has the functions is_cyclic and to_cyclic_code, where
> the latter raises an exception in the case that the particular Reed-
> Solomon code is not cyclic.
> As a last remark, this suggestion could also be extended to support
> some form of coercion between the classes, such that e.g.
> {{{
>     R = ReedSolomonCode(<appropriate constructor information>)
>     R == R.to_linear_code()
>     True}}}
>
> However, this also raises the question of equivalence between codes
> and should perhaps be deferred to a later discussion.
>
> So a code should be a class and a specific code an instance of one
> such. However, for simplicity -- and because more generality seems
> unnecessary --, I would keep the data strings as Python lists over the
> base ring (with no specific encapsulating classes). So some code's
> encode-function should take a list of the appropriate length and over
> the code's base_ring. This would probably call for a module for
> manipulating these lists as codewords (e.g. when decoding Reed-Solomon
> codes, one often goes back and forth between a list representation and
> a polynomial representation of the codewords)
>
> Continuing to the decoding concerns. The protocol for a decoding
> algorithm should be: return the one result in case of a unique-
> decoding algorithm; raise an exception (DecodingFailed) in case of
> total failure (possibly containing a best guess (usable e.g. in the
> case of systematic encoding) or other decoding-algorithm-specific
> information); and return a list of results in the case of successful
> list decoding. A decoder that might sometimes return a list should
> always do so for type safety, and a decoding algorithm should have a
> function is_list_decoder returning whether or not this is the case.
>
> Many decoders have to be initialised with a lot of precomputation. A
> different concern is that the same decoding algorithm might work for
> several families of codes (without one family being directly contained
> in another), or there might be variants of the algorithm for different
> families of codes which are so related that they could share a lot of
> code. One way to handle all of this -- keeping the above suggestion
> for code family representation in mind -- is for decoding algorithms
> to be implemented as classes themselves, which are then instantiated
> whenever some code's decode function is called. This also puts a nice
> structure on the source code (for example, the Guruswami-Sudan
> algorithm for both Reed-Solomon codes and general AG-codes can both
> reside in decoders.GuruswamiSudan if that proved favorable). The
> decode function of some code class would then simply be to instantiate
> the appropriate decoding algorithm -- if it had not already been
> instantiated and saved -- and use this object's decode function.
>
> I hope this conveys my idea on the framework. I look forward to your
> thoughts.
> Cheers,
> Johan

-- 
To post to this group, send an email to [email protected]
To unsubscribe from this group, send an email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: A scalable framework for error-correcting code functionality in Sage

Reply via email to