Re: Long-term archiving of electronic text documents

Asmus Freytag Mon, 28 Jan 2013 08:08:03 -0800

On 1/28/2013 4:30 AM, William_J_G Overington wrote:

The idea is that there would be an additional UTF format, perhaps UTF-64, so 
that each character would be expressed in UTF-64 notation using 64 bits, thus 
providing error checking and correction facilities at a character level.

I think this proposal is a few weeks early, and that it should beresubmitted on the proper date, but as UTF-256 - for greater redundancy.

UTF-256 allows each hex digit of UTF-32 to be expressed as an ASCII hexdigit (characters 0-9 and A-F encoded as bytes 0x30-0x39 and 0x41-0x46).

This leaves two bits per hex digit unused which could be utilized forbit-level error correction, or you could go to UTF-512 and encode eachcode twice.


The possibilities are endless.

A./

Re: Long-term archiving of electronic text documents

Reply via email to