Hello, Thank you very much for taking an in-depth look at it. I really appreciate it. If I were to change my patch, I'd probably change the type from unsigned to uint32_t as after doing a bit more reading I see the int size isn't the same for different architectures. That being said, your testing has confirmed that it doesn't really seem to make any difference (even possibly increasing the decode time!).
I also found it interesting that the OSD is doing so much work. I'll have to dig into how that works. Thanks for putting these tools out there in the open source world so people can learn how they work. It's really quite enlightening. -Ryan N2BP On Sat, May 4, 2024 at 2:23 PM Franke, Steven J <s-fra...@illinois.edu> wrote: > Ryan, Thanks for your suggestions. Your observation of a 0. 6% change > seems to be down in the noise, i. e. comparable to changes that you would > expect to see from run to run even without any code changes. I tried your > patch here on MacOS and > > Ryan, > > Thanks for your suggestions. Your observation of a 0.6% change seems to be > down in the noise, i.e. comparable to changes that you would expect to see > from run to run even without any code changes. I tried your patch here on > MacOS and the results showed that the patch doesn’t provide any significant > change in execution time. On 3 consecutive runs I saw original->patched > execution time changes of -1.0%, +0.1%, and +2.4% in 3 successive runs. The > average change was an increase of 0.5%. > > BTW, I noticed that your test script calls the wspr decoder using the > default value for the timeout value (10000 decoder cycles per bit). This > default value is not recommended as it will produce an unacceptably high > false decode rate while providing little, if any, improvement in good > decodes compared to more reasonable values. You will see much better > results if you call the decoder with the following command-line options: > “-C 500 -o 4 -d”. > > For example, using your test.sh > <https://urldefense.com/v3/__http://test.sh__;!!DLa72PTfQgg!O9j2nBBJFUjnWvGOTnCIKX8BwyEvabwnQMu2rXDVulMlOjHdeaFu1bOufMPwi5od3SpxStGBil1HWxT7I2jchc4oUA$> > script as-is I saw 63 good decodes out of 100 attempts and the total > execution time was around 40 seconds on my Mac laptop. Using the > command-line options “-C 500 -o 4 -d” I saw 87 good decodes and the > execution time was about 9 seconds, less than 1/4 of the time that it took > using the default settings. It’s also worth noting that with my suggested > settings, the decoding time is dominated by time spent in the Ordered > Statistics Decoder (OSD). The time spent in the Fano decoder is almost > negligible (0.63 seconds). > > Steve k9an > > On May 3, 2024, at 11:07 AM, Tolboom, Ryan via wsjt-devel < > wsjt-devel@lists.sourceforge.net> wrote: > > Good Afternoon, > > Here is a patch that eeks out a tiny performance boost for the fano > decoder in lib/wsprd/fano.c. It does two things: > > 1. It removes an ENCODE call when initializing the root node since > encstate is set to zero so you know lsym will end up being zero. > 2. Since the ENCODE macro is operating on 32 bits (POLY1 and POLY2 are > 32 bits and the XOR based parity calcs are 32 bits) encstate in the node > struct and _tmp in the ENCODE routine are changed to 32 bit integers > (unsigned int), instead of unsigned long integers. This makes for slightly > faster bitwise operations. > > With these changes I'm getting the same amount of decodes a 0.6% decrease > in the time spent in the Fano decoder: > > === Original === > 69 decodes > 0.00 0.00 0.04 0.06 0.90 29.38 0.00 31.47 > > Code segment Seconds Frac > ----------------------------------- > readwavfile 0.00 0.00 > Coarse DT f0 f1 0.00 0.00 > sync_and_demod(0) 0.04 0.00 > sync_and_demod(1) 0.06 0.00 > sync_and_demod(2) 0.90 0.03 > Stack/Fano decoder 29.38 0.93 > OSD decoder 0.00 0.00 > ----------------------------------- > Total 31.47 1.00 > > === New === > 69 decodes > 0.00 0.00 0.04 0.06 0.90 29.17 0.00 31.31 > > Code segment Seconds Frac > ----------------------------------- > readwavfile 0.00 0.00 > Coarse DT f0 f1 0.00 0.00 > sync_and_demod(0) 0.04 0.00 > sync_and_demod(1) 0.06 0.00 > sync_and_demod(2) 0.90 0.03 > Stack/Fano decoder 29.17 0.93 > OSD decoder 0.00 0.00 > ----------------------------------- > Total 31.31 1.00 > > Given that it has to do with what instructions are used for different word > sizes you might see more dramatic results on different architectures. > > I've also attached the scripts I used to test it out. > > On a related note, does anyone still have this dataset? > > http://physics.princeton.edu/pulsar/K1JT/wspr_data.tgz > <https://urldefense.com/v3/__http://physics.princeton.edu/pulsar/K1JT/wspr_data.tgz__;!!DLa72PTfQgg!O9j2nBBJFUjnWvGOTnCIKX8BwyEvabwnQMu2rXDVulMlOjHdeaFu1bOufMPwi5od3SpxStGBil1HWxT7I2hQ3aopPQ$> > > It would be nice to have some non-simulated data. > > 73, > > Ryan N2BP > <test.sh><generate.sh><fano.patch> > _______________________________________________ > wsjt-devel mailing list > wsjt-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/wsjt-devel > <https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/wsjt-devel__;!!DLa72PTfQgg!O9j2nBBJFUjnWvGOTnCIKX8BwyEvabwnQMu2rXDVulMlOjHdeaFu1bOufMPwi5od3SpxStGBil1HWxT7I2jH9otfRQ$> > > >
_______________________________________________ wsjt-devel mailing list wsjt-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/wsjt-devel