On Fri, Jun 6, 2014 at 11:24 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> On 06/05/2014 11:30 AM, Marko Rauhamaa wrote:
>> How text is represented is very different from whether text is a
>> fundamental data type. A fundamental text file is such that ordinary
>> operating system facilities can't see inside the black box (that is,
>> they are *not* encoded as far as the applications go).
> Of course they are. It may be an ASCII-encoding of some flavor or other, or
> something really (to me) strange -- but an encoding is most assuredly in
Allow me to explain what I think Marko's getting at here.
In most file systems, a file exists on the disk as a set of sectors of
data, plus some metadata including the file's actual size. When you
ask the OS to read you that file, it goes to the disk, reads those
sectors, truncates the data to the real size, and gives you those
It's possible to mount a file as a directory, in which case the
physical representation is very different, but the file still appears
the same. In that case, the OS goes reading some part of the file,
maybe decompresses it, and gives it to you. Same difference. These
files still contain bytes.
A "fundamental text file" would be one where, instead of reading and
writing bytes, you read and write Unicode text. Since the hard disk
still works with sectors and bytes, it'll still be stored as such, but
that's an implementation detail; and you could format your disk UTF-8
or UTF-16 or FSR or anything you like, and the only difference you'd
see is performance.
This could certainly be done, in theory. I don't know how well it'd
fit with any of the popular OSes of today, but it could be done. And
these files would not have an encoding; their on-platter
representations would, but that's purely implementation - the text
that you wrote out and the text that you read in are the same text,
and there's been no encoding visible.