Using UTF64 with 48 bits of Reed-Solomon error correction (RSEC) on a single UTF-16 data codeword would allow you to recover 24 data or EC bits. Remember that the EC bits, being in the same codeword, are just as likely to be damaged as the data.
Ottos' comment is more practical. You have 11 unused data bits which would allow correcting 5 data or EC bits. Jim's idea is how things are done in practice in bar codes, hard drives and space communications: The data is typically byte oriented and broken into blocks. Each block carries so many bytes of RSEC. Blocks are then interleaved, so that loss of a contiguous portion of the string is spread over many blocks and more likely not to exceed the error correction capacity of a single block. - Clive P. Hohberger, PhD MBA Managing Director *Clive Hohberger, LLC* +1 847 910 8794 [email protected] On Mon, Jan 28, 2013 at 10:01 AM, James Cloos <[email protected]> wrote: > >>>>> "WJGO" == William J G Overington <[email protected]> writes: > > WJGO> I was thinking about the problems of the long-term archiving of > WJGO> electronic text documents and thought of an idea. I wonder if I > WJGO> may please mention the idea here in the hope of there being a > WJGO> discussion so that an assessment of whether the idea is worth > WJGO> developing can be made. > > Forward error correction (FEC) is an important tool for long term > archiving, but I suspect it is better done at a string or document > level than at the character level. > > I'd suggest looking at tools like http://parchive.sourceforge.net/ > and a windows implementation thereof at http://www.quickpar.org.uk/ > for a tool for FEC. > > -JimC > -- > James Cloos <[email protected]> OpenPGP: 1024D/ED7DAEA6 > > -

