I was thinking about the problems of the long-term archiving of electronic text documents and thought of an idea.
I wonder if I may please mention the idea here in the hope of there being a discussion so that an assessment of whether the idea is worth developing can be made. The idea is that there would be an additional UTF format, perhaps UTF-64, so that each character would be expressed in UTF-64 notation using 64 bits, thus providing error checking and correction facilities at a character level. If such a UTF-64 format were established as part of the standard, then maybe in the future, for example, Microsoft WordPad could carry an option to save a text file as UTF-64. At present, on the Windows xp system that I am using, when saving a text file from within Microsoft WordPad one of the choices of file type is listed as Unicode Text Document, which uses a UTF-16 format. A document saved as UTF-64 may well take four times as many bytes as such a Unicode Text Document, yet there would be the error checking and correction facilities at a character level. Similarly, there could be a type of pdf document where the text within the pdf document were stored in UTF-64 format. So, I write to put forward the idea so as to seek opinions please on whether establishing such a UTF format, whether UTF-64 or some other size, with error checking and correction facilities at a character level would be useful. William Overington 28 January 2013

