On 10/16/06, Björn Egert <[EMAIL PROTECTED]> wrote: > On 10/8/06, Egert, Bjoern <[EMAIL PROTECTED]> wrote: > >> Hello, > >> > >> Is there a way in R to construct an (error correcting) binary code > >> e.g. for an source alphabet containing integers from 1 to say 255 > >> with the property that each pair of distinct codewords of length m > >> is at Hamming distance exactly m/2 ? > >> > >> I was suggested to use so called simplex codes, which should be > >> fairly standard, but I haven't found a direct way via R packages > >> to do so, that's why I ask whether there might be in indirect way > >> to solve this problem. > >> > >> Example: > >> v1 =c(1,2,3,4) > >> v2 =c(1,2,5,6) > >> similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal). > >> Obviously, a binary representation of would yield a different > >> similarity of: > >> binary(v1) =001 010 011 100 > >> binary(v1) =001 010 101 110 > >> similarity(binary(v1),binary(v2))= 9/12 > >> > >> Remark: The focus here is not on error correction, but rather the > >> binary encoding retaining similarity of the elements of vectors. > >> > >> Many thanks, > >> Bjoern > > > > Bjoern, > > > > NB: I'm an R newbie and I only know a bit about error correcting codes. > > > > I haven't seen any responses to your questions and I don't know if you > > still > > have a need, but it is certainly possible to construct forward error > > correction > > codes with all the great math capability in R. > > > > It seems you want to generate code words that still have the original > > bits > > present. These are systematic codes and there are lots of them available > > to use. Many codes are specified by the code word length (n), number > > of original data > > bits in each code word (k), and the minimum Hamming distance of the > > code words (d) > > as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k, > > 2^(k - 1)]. These > > codes could be generated as a simple matrix multiply in R, but are you > > sure that's what > > you want? The code words will be quite long. > > > > Regards, > > Richard Graham > > > Hello, > > thank you. > > yes, basically, that's what I want. > Just a binary encoding of an arbitrary integer value (or vector of > integers) > with the property that each pair of distinct integer values have an > equal Hamming- > distance (m/2), so as to be able to a similarity search > > I got the idea from: Gionis: "Efficient and Tunable Similar Set > Retrieval" (Chap 3.2) > > regards > Bjoern > >
Bjoern, I read only the section of the paper you mention and I'll trust that the stated properties of Simplex Codes are true. I haven't researched or verified it. [from http://magma.maths.usyd.edu.au/] "Magma is a large, well-supported software package designed to solve computationally hard problems in algebra, number theory, geometry and combinatorics. It provides a mathematically rigorous environment for computing with algebraic, number-theoretic, combinatoric and geometric objects." I don't understand a fraction of its capability but I still find it to be very useful. In fact, they have an online calculator that will give you the generator matrix you want. The online Magma calculator is at: http://magma.maths.usyd.edu.au/calc/ To calculate the generator matrix I think you are asking for, go to the above URL and cut/paste the following command: ExtendCode(SimplexCode(8)); Click "Evaluate" and the output window will contain a "[256, 8, 128] Linear Code over GF(2)". You'll need to "massage" this a bit to use it as a matrix for R. I'd use Ruby to do this, but anything will do. If you want to encode more/less than 8 bits, you can modify the above argument to SimplexCode. I used "ExtendCode" so that the codeword length == Dmin * 2 The Gionis claim I'll research or verify sometime is that _every_ pair of Simplex Code words of length m have Hamming distance == m/2. If you have a reference to a proof, I'd like to read it (like I said, I only know a bit about ECC). Good Luck with your work! Richard Graham ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.