Re: [wsjt-devel] sfrsd testing

Joe Taylor Mon, 28 Sep 2015 07:55:09 -0700

GM all,

We have made excellent progress on the quest Steve started toward an 
open-source Reed Solomon decoder that's as good or better than the 
closed-source Koetter-Vardy algorithm.  This message aims to be a 
summary of results up to now (for our own future reference), followed by 
some thoughts on where to go next.


As before, the test data set we are using starts with 1000 .wav files 
generated by program SimJT.  Each file contains white Gaussian noise and 
a weak, isolated JT65A "transmission" at S/N = -24 dB.  We wish to 
compare results with those produced by (1) unmodified Berlekamp-Massey 
errors-and-erasures decoding and (2) Koetter-Vardy algebraic 
soft-decision decoding.  These are the two schemes now provided within 
programs WSJT and WSJT-X.

We note that differences is results produced by WSJT and WSJT-X arise 
because of differing optimizations for assumed operating conditions: 
WSJT is tuned especially for single decodes under EME and EME-like 
conditions; WSJT-X is tuned for multiple decodes in crowded HF bands. 
For these tests we are most interested in the former conditions.

For convenience in testing the RS decoder, a single auxiliary file 
s3_1000.bin was created containing the symbol spectrum s3(64,63) for 
each of the 1000 original files.  The s3() arrays were computed by WSJT 
v10.0.  The first index of Fortran array s3(i,j) is frequency bin 
number, for the 64 data tones of a JT65 signal; the second index is 
symbol number for the 63 data symbols.  The auxiliary file is read by 
program rsdtest (source code at SVN address .../trunk/rsdtest/rsdtest), 
which allows testing and convenient optimization of the final stage of 
decoding a JT65 signal.

Each line in the table gives the number of valid decodes from the 1000 
simulated transmissions.  Lines 17 and higher were done with program 
rsdtest using Steve's symbol metrics.  For each of the 63 channel 
symbols in each transmission, we define p1 as the estimated probability 
that the strongest frequency bin correctly identifies the transmitted 
symbol, and p2 as the probability that the second-strongest bin is the 
"correct" one.  We generate stochastic erasure vectors for 
Berlekamp-Massey errors-and-erasures decoding based on the rank of each 
of the 63 p1 values and the ratio p2/p1.  For each potential codeword 
produced by the BM algorithm we compute a metric equal to the "soft 
distance" between the vector of received symbols and those of the tested 
codeword.  The minimum soft distance is taken to identify the correctly 
decoded message.

#  Test                              Decodes  False  Time
------------------------------------------------------------
1. WSJT-X (BM only)                       2
2. WSJT (BM only)                         5
3. WSJT-X + kvasd                       189
4. WSJT-X + kvasd (thresh0=1, ntest>0)  207
5. WSJT-X + sfrsd (Linux)               302
6. WSJT-X + sfrsd (Win32)               309
7. WSJT-X + sfrsd (Linux, thresh0=1)    348
8. WSJT-X + sfrsd (Win32, thresh0=1)    350
9. WSJT + kvasd (Linux)                 809
10.WSJT + kvasd (Win32)                 809

11.WSJT + sfrsd (10000)                 464
12.WSJT + sfrsd (SFM no ntest 10000)    519
13.WSJT + sfrsd (SFM no ntest 20000)    543
14.WSJT + sfrsd (SFM no ntest 1000)     342
15.WSJT + sfrsd (SFM no ntest 100000)   706
16.WSJT + sfrsd (SFM no ntest 1000000)  786  (11 hours)

17.rsdtest (r5939, ntrials=10000)       809 + 2 false (4.6 min)
18.rsdtest (r5939, ntrials=100000)      876 + 6 false (29 min)
19.rsdtest (r5939, ntrials=1000000)     902 + 9 false (220 min)

It's clear that we are as good as the KV decoder at ntrials=10000, and 
we're even better with more stochastic trials.  Decoding speed is 
excellent, too -- even before any careful tuning for speed or use of 
multi-CPU parallel processing.

At present the erasure probabilities have been tuned for maximum number 
of decodes, without much regard for the small number of false decodes 
now slipping through.  With further effort these things might be better 
balanced.

Now, some thoughts from my working notebook on where we might go next. 
In no particular order:

1. I can't remember whether I (or was it Steve?) removed the loop that 
applied the "Forney Generalized Minimum Distance pattern" after the 
stochastic loop.  Was it ever determined that the GMD search was no 
longer useful?

2. Should Steve's "nhard-n2ndbest" stopping criterion be tested again, 
and compared with the ones used in SVN r5939?

3. We are not yet making full use of mr2sym(j), the "second best" 
candidate for the value of symbol j.  What (if anything) might be 
achieved by doing substitutions rather than erasures before calling the 
BM routine?  We want to take advantage of cases where we can determine 
that mr2sym(j) has a significantly-above-average probability of being 
the correct value.

4. This may be a wacky idea, but just in case it could be useful: might 
we gain anything by doing something to ensure that "maximally different" 
erasure vectors are used for the "ntrials" attempts at decoding, rather 
than vectors determined randomly?  For example: suppose we recorded all 
the vectors era_pos[0:numera-1] resulting in successful decodes, over a 
very large number of trials, and then used those vectors first before 
resorting to vectors determined from a random-number generator.  Clearly 
this would make the program much faster, when run on the original data 
set.  Would it also be faster when run on *different* data?

5. So fat our tests have used idealized data: AWGN channel, constant 
signal strength, etc.  We need to test the current algorithm on real 
data, including EME signals and crowded portions of (say) the JT65 
sub-band on 20 meters.

6. We need to do optimizations for execution speed, including parallel 
processing of the stochastic loop.

7. Then, if all is successful, we should make sfrsd the one-and-only RS 
decoder in WSJT and WSJT-X, doing away with any need for kvasd.

Other comments and suggestions will be very welcome!

        -- Joe

------------------------------------------------------------------------------
_______________________________________________
wsjt-devel mailing list
wsjt-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wsjt-devel

Re: [wsjt-devel] sfrsd testing

Reply via email to