Re: Proposal for a new I/O library design
Ben Rudiak-Gould (Sun, Jul 27, 2003 at 09:35:41PM -0700): module System.ProposedNewIOModel (...) where I assume that all I/O occurs in terms of octets. I think that this holds true of every platform on which Haskell is implemented or is likely to be implemented. type Octet = Word8 If it should be really generall the base type should be Bool. File offsets are 64 bits on all platforms. This model never uses negative offsets, so there's no need for a signed type. (But perhaps it would be better to use one anyway?) BlockLength should be something appropriate to the architecture's address space. type FilePos = Word64 type BlockLength = Int type FilePos = Integer type BlockLength = Integer data File -- abstract I would prefer: data ImmutableStore -- abstract data MutableStore -- abstract A note about buffering: Actually, current UNIX kernels do not support non-blocking descriptors; they support non-blocking open files. Furthermore, many programs will break if they encounter non-blocking mode. This means that you must not [change blocking mode] for a descriptor inherited from another program. See http://cr.yp.to/lib/io.html. A value of type InputStream or OutputStream represents an input or output stream: that is, an octet source or sink. Two InputStreams or OutputStreams compare equal iff reading/writing one also reads/writes the other. (Should I call these ports instead of streams? How about OctetSource and OctetSink?) data InputStream-- abstract data OutputStream -- abstract Use data OctetSource-- abstract (or BitSource, s.a.) data OctetSink -- abstract (or BitSink, s.a.) for octets (or bits/bools) and data PacketSource -- abstract data PacketSink -- abstract to send complete packets of data by the latter. Sincerly, -- Stefan ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
RE: Proposal for a new I/O library design
[ replies to [EMAIL PROTECTED] ] On the whole, I think this is a good direction to explore. I like the separation of Files from Streams: indeed it would remove much of the complication in the existing system caused by having Handles which can be both read and written. Also, it gives a nice way to integrate other objects such as Sockets into the I/O system, which can also have streams layered on top of them. I'm concerned about one implementation difficulty. Your File type is independent of the filesystem. That is, on Unix it corresponds to an inode. Creating a File must correspond to opening it (in Unix speak). Creating a stream corresponds to duplicating the file descriptor (you could probably avoid too many unnecessary dups by being clever). There's a potential implementation difficulty, though: lookupFileByPathname must open the file, without knowing whether the file will be used for reading or writing in the future. So I would suggest that operations which create a value of type File take a read/write flag too. type FilePos = Word64 type BlockLength = Int FilePos should be Integer. fCheckRead :: File - FilePos - BlockLength - IO Bool fCheckWrite :: File - FilePos - BlockLength - IO Bool What do these do? If they're supposed to return True if the required data can be read/written without blocking, then I suspect that they are not useful. Fundamental operations on streams. Maybe Octet is supposed to represent Octet or EOS, though I'm not sure this is enough for proper EOS handling. I'd use the traditional 'isEOF' way of detecting end of file. On naming: it's probably not a good idea to use the 'is' prefix, since it is already used for predicates (meaning literally 'is' rather than an abbreviation for 'InputStream'). isGet :: InputStream - IO (Maybe Octet) isPeek :: InputStream - IO (Maybe Octet) isGetBlock :: InputStream - BlockLength - XXX - IO BlockLength -- efficiency hack osPut :: OutputStream - Octet - IO () osPuts :: OutputStream - [Octet] - IO () osPutBlock :: OutputStream - BlockLength - XXX - IO () osFlush:: OutputStream - IO () You need operations to control buffering, too. Something like h{Set,Get}Buffering would be fine. You will also want a way to get back from an InputStream to the underlying object, eg. the (File,FilePos) pair if one exists. It's not pretty, but you certainly want a way to close a stream. Finalizers aren't reliable enough. How did you intend text encodings to work? I see several possibilities: textDecode :: TextEncoding - [Octet] - [Char] or decodeInputStream :: TextEncoding - InputStream - TextInputStream getChar :: TextInputStream - IO Char etc. or setInputStreamCoding :: InputStream - TextEncoding - IO () getChar :: InputStream - IO Char The first one is nice, but hard to optimise, and it will get complicated for encodings which have state. The second one is probably the best compromise. data Directory -- abstract I don't see a reason for changing the existing Directory support (System.Directory). Could you give some motivation here? Is the idea to abstract away from the syntax of pathnames on the platform (eg. directory separator characters)? If so, I'm not sure it's worthwhile. There are lots of differences between pathname conventions: case sensitivity, arbitrary limits on the lengh of filenames, filename extensions, and so on. Convenient shortcuts for common cases. lookupFileByPathname :: String - IO File Here, I suggest we need lookupFileByPathname :: FilePath - IOMode - IO File lookupInputStreamByPathname :: String - IO InputStream -- at least as likely to succeed as lookupFileByPathname and similarly createFileOutputStream :: FilePath - IO OutputStream appendFile :: FilePath - IO OutputStream Cheers, Simon ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
System.Directory (was RE: Proposal for a new I/O library design)
Hi guys, I'm not replying to anything in the message, but... Is the idea to abstract away from the syntax of pathnames on the platform (eg. directory separator characters)? If so, I'm not sure it's worthwhile. There are lots of differences between pathname conventions: case sensitivity, arbitrary limits on the lengh of filenames, filename extensions, and so on. Would there be any way to get some of these differences into the System.Directory structure? At least the following would be nice: pathSeparator :: Char '\\' on Windows, '/' on unices, ':' (I believe) on macs, etc... directorySeparator :: Char ';' on Windows, ':' on unices, i have no idea on macs isCaseSensitive :: Bool False on Windows, True on (all?) unices, i have no idea on macs given just these, i think we'd all be a lot happier. I also don't particularly care whether these are IO operations or just values (so long as they are constant, they might as well be values with unsafePerformIO wrapped around them if necessary). My current approach to figuring this out is to create a directory, change to that directory, get the current path name and try to parse it. This is bad for so many reasons I won't enumerate them here. ...unless this stuff is hiding somewhere else, please let me know (but System.Directory would probably be a good place for it to end up)... - Hal ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Proposal for a new I/O library design
Ben, I live in a different universe, but over here I prefer to represent files purely as memory-mapped objects. In this view, there is no difference between a read-only file and an immutable array of bytes (a byte being a natural number between 0 to 255). A read-write file is then equivalant to a mutable array (or a reference to a mutable array on a heap) of the same. Treating these all as heap references tends to be cleaner, because you can compare the references for equality, which is significant even for read-only files, because two files which contain the same exact data are not necessarily the same file, whereas opening the same file in two different places should result in equal references. This approach greatly simplifies lots of things now that all modern operating systems can perform file-mapping efficiently with the virtual memory subsystem paging pieces in and out as necessary. It gets rid of pieces of information which are redundent from the low-level file system point of view (the file handle itself, the current file pointer, etc). The typical C/Unix approach is to deal with network (TCP or UDP) connections as streams, too. Obviously, memory-mapped files aren't a good way of exposing them -- doing so would require buffering all past data, as well as blocking when waiting on yet-unreceived data (when you really want to be able to query whether there is incoming data available). Instead, I prefer to conceptualize network connections as a socket / packet-based interface, with functions to open/close sockets, send a complete packet (being an array of bytes) to a socket, receive a packet from a socket, and query packet availability. With this approach, there is no redundency or missing information; everything that is observable from the protocol point of view is an observable in the language interface, and nothing more. In this manner, it's possible to get rid of all remnants of Unix-like streams from a language's IO interface. -Tim - Original Message - From: Ben Rudiak-Gould [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, July 27, 2003 11:35 PM Subject: Proposal for a new I/O library design The other day I was reading the Haskell i18n debate in the list archives, and started thinking about possible replacements for the existing Haskell file I/O model. It occurred to me that the Haskell community has really dropped the ball on this one. Haskell's design has always emphasized doing the right thing, not merely doing the thing that everyone else happens to be doing. It's that philosophy that led to the invention of the monadic I/O model, among other things. And yet, what do we choose for our I/O primitives? The same old crocks that everyone else was using. We open and close files (whatever that's supposed to mean); we expose file handles to the user; we even maintain a current position in the file, which is an unnecessary global state variable if I've ever seen one. The proposal below is the result of a few hours spent thinking about how the file system would be accessed if it were actually implemented in Haskell, instead of behind a weird C API. I'm very interested in hearing comments and criticism. In particular, I want to know if there's enough interest in this model that I should actually try to implement it. The most important idea in this design as far as i18n is concerned is the separation of random-access files from input and output streams. Most of the ugliness of the usual file I/O interface comes from conflating these three concepts, which are almost totally unrelated. In particular, there's no need in this model to worry about the meaning of reading or seeking in a text file. Text encoding and decoding apply to streams, not files. To read text from a file you layer an input stream on it, apply a text parser to that, and read characters. If you need to seek to a new location, you create a new stream which starts at that location in the underlying file. module System.ProposedNewIOModel (...) where I assume that all I/O occurs in terms of octets. I think that this holds true of every platform on which Haskell is implemented or is likely to be implemented. type Octet = Word8 File offsets are 64 bits on all platforms. This model never uses negative offsets, so there's no need for a signed type. (But perhaps it would be better to use one anyway?) BlockLength should be something appropriate to the architecture's address space. type FilePos = Word64 type BlockLength = Int A value of type File represents a file, which is essentially a resizable strict array of octets. Two values of type File compare equal if they are the same file -- that is, if they have the same contents and changes to one also appear in the other. (File is a bad name for this. For one thing, NTFS and HFS can associate more than one chunk of data with each directory entry, and file usually refers to all the chunks together. Fork would be more
Re: Proposal for a new I/O library design
Tim Sweeney wrote: I live in a different universe, but over here I prefer to represent files purely as memory-mapped objects. [...] I'd prefer official support for this in Haskell, too. The need for this and other I/O-related stuff was recognized by the Java community, too: It resulted in the java.nio package family, see e.g. http://java.sun.com/j2se/1.4.2/docs/api/ I suggest we should have a look at this before re-inventing the wheel. Not that all Java libraries are perfect in their first incarnation, far from it, see e.g. java.awt, java.net.URL, java.io, ... :-P But the second try (javax.swing, java.net.URI, java.nio, ...) is often quite usable, so we can probably learn some useful concepts from the latter. Cheers, S. ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Proposal for a new I/O library design
On Mon, Jul 28, 2003 at 12:56:04PM -0500, Tim Sweeney wrote: Ben, I live in a different universe, but over here I prefer to represent files purely as memory-mapped objects. In this view, there is no difference between a read-only file and an immutable array of bytes (a byte being a natural number between 0 to 255). A read-write file is then equivalant to a mutable array (or a reference to a mutable array on a heap) of the same. Treating these all as heap references tends to be cleaner, because you can compare the references for equality, which is significant even for read-only files, because two files which contain the same exact data are not necessarily the same file, whereas opening the same file in two different places should result in equal references. [...] In this manner, it's possible to get rid of all remnants of Unix-like streams from a language's IO interface. You certainly can't always mmap the whole file into memory at once (on a 32-bit architecture at least), because: 1) there are files that won't fit into 32-bit address space 2) ... and usually you don't have the whole address space for you. I mean there are mmaped libraries, stack, allocated memory, etc. so the address space can be somewhat fragmented. 3) after mmaping many big files (each fitting into 32-bit address space) you can run out of address space I've been bitten by all this problems and now I mmap my files in parts, mapping and unmapping them as needed. It can be done, but it is no longer that simple. -Tim Best regards, Tom -- .signature: Too many levels of symbolic links ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: System.Directory (was RE: Proposal for a new I/O library design)
pathSeparator :: Char '\\' on Windows, '/' on unices, ':' (I believe) on macs, etc... Used to be ':' in Classic MacOS, and there are still some old routines in Apple's Carbon library that take ':'-separated paths. However, Apple always insisted that Pathnames should only be used for display purposes, mostly because a pathname did not always uniquely identify a file (!). Mac OS X uses normal unix-style paths for everything that concerns us. Also, the Classic Mac-style paths had different semantics (no . or .., etc.), so treating them would have been more difficult than just using a different separator. Fortunately, they're a thing of the past. directorySeparator :: Char ';' on Windows, ':' on unices, i have no idea on macs As there was no command line, no PATH, and no textual config files on classic Mac OS, there is no mac-specific directory separator, so it's ':'. (although I think that stealing yet another character from isCaseSensitive :: Bool False on Windows, True on (all?) unices, i have no idea on macs It's not that easy. Case sensitivity is a property of a file system, not of the operating system. So if you mount a Windows or Mac OS volume on a Linux system, the filenames on that volume will still be case-insensitive (but case-preserving). On Mac OS X, the default file system type is HFS+ (a.k.a. Mac OS Extended), which is case insensitive, but you can also choose UFS (case sensitive). You could also have both, on two partitions. Of course, False for Windows and MacOS, True for everything else is a reasonable guess, but you can't rely on it. Right now I have no idea how to implement something like isCaseSensitive :: FilePath - IO Bool which would determine whether a specific path would be case sensitive. Maybe it's worth the effort to think about it. given just these, i think we'd all be a lot happier. I agree. If the last one can be implemented properly, that is. Cheers, Wolfgang ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Proposal for a new I/O library design
Tomasz Zielonka wrote: You certainly can't always mmap the whole file into memory at once (on a 32-bit architecture at least), because: [...] I think all these issues are handled by java.nio.Buffer and friends. Are there any people on this list with real-world war stories about java.nio? So far I had a look at those packages from an implementation perspective only... Cheers, S. ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: System.Directory (was RE: Proposal for a new I/O library design)
On Mon, Jul 28, 2003 at 07:51:51PM +0200, Wolfgang Thaller wrote: isCaseSensitive :: Bool False on Windows, True on (all?) unices, i have no idea on macs It's not that easy. Case sensitivity is a property of a file system, not of the operating system. So if you mount a Windows or Mac OS volume on a Linux system, the filenames on that volume will still be case-insensitive (but case-preserving). On Mac OS X, the default file system type is HFS+ (a.k.a. Mac OS Extended), which is case insensitive, but you can also choose UFS (case sensitive). You could also have both, on two partitions. Of course, False for Windows and MacOS, True for everything else is a reasonable guess, but you can't rely on it. Right now I have no idea how to implement something like isCaseSensitive :: FilePath - IO Bool See 'statfs(2)' and 'fstatfs(2)', there should be enough info there to implement it. plus an interface to the other information returned by these functions would be useful. perhaps we need a standard trinary data type, True,False,Unknown. I guess (Maybe Bool) works. John -- --- John Meacham - California Institute of Technology, Alum. - [EMAIL PROTECTED] --- ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: System.Directory (was RE: Proposal for a new I/O library design)
On Mon, 28 Jul 2003, Wolfgang Thaller wrote: It's not that easy. Case sensitivity is a property of a file system, not of the operating system. Actually, it's not even that easy. The NT native API allows you to specify case sensitivity as a flag when creating or opening a file in any directory (at least on NTFS). You can create file entries this way which are inaccessible from the Win32 subsystem because they're shadowed by other names in the same directory which differ only in case. If we ignore that complication, I think the right way to handle this is with dIsCaseSensitive :: Directory - IO Bool. Assuming, as always, that there's a way to implement that. Or perhaps it should be Maybe Bool instead of Bool. isCaseSensitive :: FilePath - IO Bool I don't think it's clear what this should mean. Assuming you have a case-insensitive filesystem rooted at /mnt, what should isCaseSensitive /mnt return? The filesystem rooted there is case-insensitive, but the pathname passed to the function is 100% case-sensitive. (This also has the usual problems associated with any function which uses pathnames. See my comments on the Libraries list.) -- Ben ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: System.Directory (was RE: Proposal for a new I/O library design)
Hal Daume wrote: Would there be any way to get some of these differences into the System.Directory structure? At least the following would be nice: pathSeparator :: Char '\\' on Windows, '/' on unices, ':' (I believe) on macs, etc... Either '\\' or '/' on Windows. The former is preferred, but the latter also works in most contexts. For Windows, there's also the issue of drive letters and network (UNC) paths. isCaseSensitive :: Bool False on Windows, True on (all?) unices, i have no idea on macs It's more accurate to say that most native Unix filesystems are case-sensitive. However, many Unix systems can mount foreign filesystems (FAT, SMB) which aren't case-sensitive. Another significant distinction is in the handling of non-ASCII characters. Windows treats filenames as lists of characters; VFAT and NTFS use Unicode, while FAT filesystems may have an associated codepage. OTOH, Unix treats filenames as lists of bytes; while applications may impose an (arbitrary) encoding on filenames, the OS doesn't. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell