Thanks now, the effective filelength is a metadata stored outside of the main stream (jsut like the filename(s) used in the hierarchic filesystem). I just wonder why Linux/Unix filesystems and API still refuse to integrate a more complete support for metadata in its filesystems (including for increased and finer security controls like ACLs which now evolve to become less host-centric and more network/Internet-oriented with domains). Of course there are various supports for metadata on other filesystems, such as even the legacy MaxOS filesystem with resource forks and VMS filesystems, whose features are integrated (many of them being now supported with additional filesystem drivers on Linux/Unix). But why isn't it formalized as a plain part of the OS in all its API remains a mystory for me. The low-level single-stream API is now too low and lacks granularity (we should even be able to organize filesystems in a less hierarchical, more relational and more object-oriented way). Things like data signatures could even be separated more formally, as well as the actual encoding of end-of-lines, paragraphs, or even word boundaries. The I/O API would then treat files only as "views" offering the wanted services to interact with files, including plain-text files that are structured and should be augamentable with any number of additional metadata on various levels. At least the most interoperable filesystem that exists today is the HTTP filesystem, and thanks it supports a basic layer of metadata, with additional features like cache control and management of the lifetime of objects, and basic security at the file level. But other developments are still needed to extend it first to a full relational filesystem, then to a full object oriented filesystem. At that time, we will even wonder even more what is really "plain-text". For now, Linux/Unix still has a limited support for text files, but this is only realized in a user-space library, and not enforced by the OS which exposes too much things that are modifiable indepenantly without any control. Things like the internal encoding of text files, or the encoding of end of lines are then managed by softwares in a non-completely interoperable way, only because the requirements are not checked and enforced. We sill leave in a world where complete binary streams are interchanged, with difficulties to interpret them simply because the neceassry metadata are not checked or not transported as a requirement. With an objet-oriented design, based on APIs rather than unstructured streams, we couldeven get more performance (over long distance WAN networks with high latency), by avoiding transporting lots of unqualified things that everyone interprets and implements as he wants. And even if we are seeng a recent development of "cloud" solutions, this is not really the solution we need for the long term : this is still the old-fashioned unbalanced model based on clients and servers, instead of peer-to-peer systems where peers cooperate transparently to offer the service in a virtual host that can be located anywhere, where devices can connect and add their own computing power to the system, and work transparently on the data belonging to separate virtual spaces, and where things like redundancy, backups, resilience to hardware failures, and needs for more power can be obtained transaprently from the rest of the network, with automatic optimizations to reduce the latency with automatic caching systems and distributed validation of the data. The code to manage this data would navigate from one node to the other transparently in the background, based on demands. The way to interact with such system would be exclusively through objects exposing their APIs, their security requirements and managing the identities of actors and their access rights. Things that are old-fahsioned in the "stream" approach are for example the filepositions. Ideally texts are just enumerations of objects like paragraphs that are themselves structured as enumerations of lower boundaries, up to the lowest level which is the code point level. Code units (including the surrogates artefacts), bytes, encodings, data compression do not belong to the definition of what is plain-text (which should be transparently convertible to match what a user will need to handle/show/transform in the most convenient way for him). And there's no reason why we can't interact with texts only at the current "file" level only, under only the same security realm and with a single owner of the stream, when this stream could just be a private view on larger objects managed collectively and not exposing the same thing to everyone (a user, or a group, or a security domain, or an application or service, or another object used to create distinct views to any of them for specific needs). Some day we will even forget what is UTF-8. And may be the correct minimum level for handling text will only be the grapheme cluster represented as an object though its own local API. There will be a complete separation between text input, text storage, text interchanges between computing nodes, text transforms, and text output. Programmers will no longer write programs working at the stream level (this level being defined only within the blackbox of the underlying OS connecting users with their shared applications and data accessible over a worldwide network from all kinds of devices.
This is a dream. We are far from this level, full network-based peer-to-peer OSes still do not exist, we are still working too near from the hardware level, and softwares are still not perceived as a location-independant and hardware-indepedant service (as a consequence we have now billions of computing devices connecting to the net, that pass 95% of their on-time waiting without nothing to do, and the impossibility to harness the extra computing power that is available almsot everywhere, except when we need it locally, plus gigatons of devices recomputing the same things with lots of energy and hardware garbaged, and polluting our environments). 2012/7/29 John W Kennedy <[email protected]>: > On Jul 28, 2012, at 11:52 AM, Doug Ewell <[email protected]> wrote: >> ^Z as an EOF marker for text files was part of the MS-DOS legacy from >> CP/M, where all files were written to a multiple of the disk block size >> (I think 128 for CP/M and 512 for MS-DOS 1.x), and there had to be some >> way to tell where the real text content ended. New stream-based I/O >> calls in MS-DOS 2.0 made this mechanism unnecessary. Unix systems had no >> legacy from CP/M, so they never had this problem. > > Worse than that, actually. Actual MS-DOS APIs from 1.0 on were able to handle > the situation, but the MS-DOS BASIC language and interpreter, with CP/M > roots, assumed the 128-byte sector, and therefore demanded the ^Z. It was > fixed as early as 1.1, I think, but the malady lingers on. > > > >>> I.e., this is why we do have this messy text OR binary file I/O >>> distinction like O_BINARY (for open(2)), "b" (for fopen(3)) or >>> binmode (perl(1)). Because without those a text file will see >>> End-Of-File at the ^Z, not at the real end of the file. >> >> The reason for the text/binary distinction on DOS and Windows is >> conversion between Unix-standard LF and Windows (DOS, CP/M)-standard >> CRLF. It might be true that library calls to read a file in text mode >> will stop at ^Z, but Notepad and Wordpad don't. I know the library >> doesn't automatically write ^Z. Almost nobody in the MS world uses the >> ^Z convention on purpose any more; many don't even know about it. >> >>> (Which rises the immediate question why the Microsoft programmers did >>> not embed the meta information in this section at the end of the file. >>> But i don't really want to know.) >> >> See above. The intent of ^Z was never to distinguish data from metadata, >> as with the Mac data and resource forks. >> >> But of course none of this has anything to do with U+FEFF. >> >>> So do the programmers have to face the same conditions? I don't >>> really think so. They prefer driving plain text readers up the wall. >>> Successfully. >> >> Again, we don't really have this kind of evil intent, though it's often >> fun and convenient for people to imagine we do. >> >> -- >> Doug Ewell | Thornton, Colorado, USA >> http://www.ewellic.org | @DougEwell >> > > -- > John W Kennedy > "Give up vows and dogmas, and fixed things, and you may grow like That. > ...you may come to think a blow bad, because it hurts, and not because it > humiliates. You may come to think murder wrong, because it is violent, and > not because it is unjust." > -- G. K. Chesterton. "The Ball and the Cross" > > > > >

