On 10/16/06, Björn Egert [EMAIL PROTECTED] wrote:
On 10/8/06, Egert, Bjoern [EMAIL PROTECTED] wrote:
Hello,
Is there a way in R to construct an (error correcting) binary code
e.g. for an source alphabet containing integers from 1 to say 255
with the property that each pair of distinct codewords of length m
is at Hamming distance exactly m/2 ?
I was suggested to use so called simplex codes, which should be
fairly standard, but I haven't found a direct way via R packages
to do so, that's why I ask whether there might be in indirect way
to solve this problem.
Example:
v1 =c(1,2,3,4)
v2 =c(1,2,5,6)
similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal).
Obviously, a binary representation of would yield a different
similarity of:
binary(v1) =001 010 011 100
binary(v1) =001 010 101 110
similarity(binary(v1),binary(v2))= 9/12
Remark: The focus here is not on error correction, but rather the
binary encoding retaining similarity of the elements of vectors.
Many thanks,
Bjoern
Bjoern,
NB: I'm an R newbie and I only know a bit about error correcting codes.
I haven't seen any responses to your questions and I don't know if you
still
have a need, but it is certainly possible to construct forward error
correction
codes with all the great math capability in R.
It seems you want to generate code words that still have the original
bits
present. These are systematic codes and there are lots of them available
to use. Many codes are specified by the code word length (n), number
of original data
bits in each code word (k), and the minimum Hamming distance of the
code words (d)
as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k,
2^(k - 1)]. These
codes could be generated as a simple matrix multiply in R, but are you
sure that's what
you want? The code words will be quite long.
Regards,
Richard Graham
Hello,
thank you.
yes, basically, that's what I want.
Just a binary encoding of an arbitrary integer value (or vector of
integers)
with the property that each pair of distinct integer values have an
equal Hamming-
distance (m/2), so as to be able to a similarity search
I got the idea from: Gionis: Efficient and Tunable Similar Set
Retrieval (Chap 3.2)
regards
Bjoern
Bjoern,
I read only the section of the paper you mention and I'll trust that
the stated properties of Simplex Codes are true. I haven't researched
or verified it.
[from http://magma.maths.usyd.edu.au/]
Magma is a large, well-supported software package designed to solve
computationally hard problems in algebra, number theory, geometry and
combinatorics. It provides a mathematically rigorous environment for
computing with algebraic, number-theoretic, combinatoric and geometric
objects.
I don't understand a fraction of its capability but I still find it to
be very useful. In fact, they have an online calculator that will
give you the generator matrix you want. The online Magma calculator
is at:
http://magma.maths.usyd.edu.au/calc/
To calculate the generator matrix I think you are asking for, go to
the above URL and cut/paste the following command:
ExtendCode(SimplexCode(8));
Click Evaluate and the output window will contain a [256, 8, 128]
Linear Code over GF(2). You'll need to massage this a bit to use
it as a matrix for R. I'd use Ruby to do this, but anything will do.
If you want to encode more/less than 8 bits, you can modify the above
argument to SimplexCode.
I used ExtendCode so that the codeword length == Dmin * 2
The Gionis claim I'll research or verify sometime is that _every_ pair
of Simplex Code words of length m have Hamming distance == m/2. If
you have a reference to a proof, I'd like to read it (like I said, I
only know a bit about ECC).
Good Luck with your work!
Richard Graham
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.