Re: Long-term archiving of electronic text documents

Clive Hohberger Mon, 28 Jan 2013 09:04:59 -0800

Using UTF64 with 48 bits of Reed-Solomon error correction (RSEC) on a
single UTF-16 data codeword would allow you to recover 24 data or EC bits.
Remember that the EC bits, being in the same codeword, are just as likely
to be damaged as the data.


Ottos' comment is more practical. You have 11 unused data bits which would
allow correcting 5 data or EC bits.

Jim's idea is how things are done in practice in bar codes, hard drives and
space communications: The data is typically byte oriented and broken into
blocks. Each block carries so many bytes of RSEC. Blocks are then
interleaved, so that loss of a contiguous portion of the string is spread
over many blocks and more likely not to exceed the error correction
capacity of a single block.
-
Clive P. Hohberger, PhD MBA
Managing Director
*Clive Hohberger, LLC*
+1 847 910 8794
[email protected]


On Mon, Jan 28, 2013 at 10:01 AM, James Cloos <[email protected]> wrote:

> >>>>> "WJGO" == William J G Overington <[email protected]> writes:
>
> WJGO> I was thinking about the problems of the long-term archiving of
> WJGO> electronic text documents and thought of an idea.  I wonder if I
> WJGO> may please mention the idea here in the hope of there being a
> WJGO> discussion so that an assessment of whether the idea is worth
> WJGO> developing can be made.
>
> Forward error correction (FEC) is an important tool for long term
> archiving, but I suspect it is better done at a string or document
> level than at the character level.
>
> I'd suggest looking at tools like http://parchive.sourceforge.net/
> and a windows implementation thereof at http://www.quickpar.org.uk/
> for a tool for FEC.
>
> -JimC
> --
> James Cloos <[email protected]>         OpenPGP: 1024D/ED7DAEA6
>
>


-

Re: Long-term archiving of electronic text documents

Reply via email to