On Thu, 05 Jun 2014 21:30:11 +0300, Marko Rauhamaa wrote:
> Terry Reedy <tjre...@udel.edu>:
>> Different OSes *do* have different assumptions. Both MacOSX and current
>> Windows use (UCS-2 or) UTF-16 for text.
> Linux can use anything for text; UTF-8 has become a de-facto standard.
> How text is represented is very different from whether text is a
> fundamental data type. A fundamental text file is such that ordinary
> operating system facilities can't see inside the black box (that is,
> they are *not* encoded as far as the applications go).
Wait, are they black-boxes to the *operating system* or to
*applications*? They aren't the same thing.
In any case, I reject your premise. ALL data types are constructed on top
of bytes, and so long as you allow applications *any way* to coerce data
types to different data types, you allow them to see "inside the black
box". I can extract the four bytes from a C long integer, but that
doesn't mean that C longs aren't fundamental data types in Unix/Linux.
> I have no idea how opaque text files are in Windows or OS-X.
Exactly as opaque as they are in Unix, which is to say not at all. Just
open the file in binary mode, and voilà you see the underlying bytes.
All you're doing is pointing out that, in modern electronic computers,
the fundamental data structure which underlies all others (the
indivisible protons and neutrons, so to speak, only there are 256 of them
rather than 2) is the byte. We know this, and don't dispute it.
(Like protons and neutrons, we can see inside bytes to the quark-like
bits that make up bytes. Like quarks, bits do not exist in isolation, but
only inside bytes.)
>> For Windows, at least, the interface is much improved in Python 3.
> Yes, I get the feeling that Python is reaching out to Windows and OS-X
> and trying to make linux look like them.
Unicode support in OS-X is (I have been assured) is very good, probably
better than Linux. Apple has very high standards when it comes to their
apps, and provides rich Unicode-aware APIs.
But Linux Unicode support is much better than Windows. Unicode support in
Windows is crippled by continued reliance on legacy code pages, and by
the assumption deep inside the Windows APIs that Unicode means "16 bit
characters". See, for example, the amount of space spent on fixing
Windows Unicode handling here: