Long-term archiving of electronic text documents

William_J_G Overington Mon, 28 Jan 2013 04:47:27 -0800

I was thinking about the problems of the long-term archiving of electronic text 
documents and thought of an idea.


I wonder if I may please mention the idea here in the hope of there being a 
discussion so that an assessment of whether the idea is worth developing can be 
made.

The idea is that there would be an additional UTF format, perhaps UTF-64, so 
that each character would be expressed in UTF-64 notation using 64 bits, thus 
providing error checking and correction facilities at a character level.

If such a UTF-64 format were established as part of the standard, then maybe in 
the future, for example, Microsoft WordPad could carry an option to save a text 
file as UTF-64.

At present, on the Windows xp system that I am using, when saving a text file 
from within Microsoft WordPad one of the choices of file type is listed as 
Unicode Text Document, which uses a UTF-16 format.

A document saved as UTF-64 may well take four times as many bytes as such a 
Unicode Text Document, yet there would be the error checking and correction 
facilities at a character level.

Similarly, there could be a type of pdf document where the text within the pdf 
document were stored in UTF-64 format.

So, I write to put forward the idea so as to seek opinions please on whether 
establishing such a UTF format, whether UTF-64 or some other size, with error 
checking and correction facilities at a character level would be useful.

William Overington

28 January 2013

Long-term archiving of electronic text documents

Reply via email to