On Thu, 05 Jun 2014 21:30:11 +0300, Marko Rauhamaa wrote:

> Terry Reedy <tjre...@udel.edu>:
> 
>> Different OSes *do* have different assumptions. Both MacOSX and current
>> Windows use (UCS-2 or) UTF-16 for text.
> 
> Linux can use anything for text; UTF-8 has become a de-facto standard.
> 
> How text is represented is very different from whether text is a
> fundamental data type. A fundamental text file is such that ordinary
> operating system facilities can't see inside the black box (that is,
> they are *not* encoded as far as the applications go).

Wait, are they black-boxes to the *operating system* or to 
*applications*? They aren't the same thing.

In any case, I reject your premise. ALL data types are constructed on top 
of bytes, and so long as you allow applications *any way* to coerce data 
types to different data types, you allow them to see "inside the black 
box". I can extract the four bytes from a C long integer, but that 
doesn't mean that C longs aren't fundamental data types in Unix/Linux.


> I have no idea how opaque text files are in Windows or OS-X.

Exactly as opaque as they are in Unix, which is to say not at all. Just 
open the file in binary mode, and voilĂ  you see the underlying bytes.

All you're doing is pointing out that, in modern electronic computers, 
the fundamental data structure which underlies all others (the 
indivisible protons and neutrons, so to speak, only there are 256 of them 
rather than 2) is the byte. We know this, and don't dispute it.

(Like protons and neutrons, we can see inside bytes to the quark-like 
bits that make up bytes. Like quarks, bits do not exist in isolation, but 
only inside bytes.)



>> For Windows, at least, the interface is much improved in Python 3.
> 
> Yes, I get the feeling that Python is reaching out to Windows and OS-X
> and trying to make linux look like them.

Unicode support in OS-X is (I have been assured) is very good, probably 
better than Linux. Apple has very high standards when it comes to their 
apps, and provides rich Unicode-aware APIs.

But Linux Unicode support is much better than Windows. Unicode support in 
Windows is crippled by continued reliance on legacy code pages, and by 
the assumption deep inside the Windows APIs that Unicode means "16 bit 
characters". See, for example, the amount of space spent on fixing 
Windows Unicode handling here:

http://www.utf8everywhere.org/



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to