Re: Proposal for a new I/O library design

2003-07-29 Thread Stefan Karrmann
Ben Rudiak-Gould (Sun, Jul 27, 2003 at 09:35:41PM -0700): 
  module System.ProposedNewIOModel (...) where
 
 I assume that all I/O occurs in terms of octets. I think that this
 holds
 true of every platform on which Haskell is implemented or is likely to
 be
 implemented.
 
  type Octet = Word8 

If it should be really generall the base type should be Bool.

 File offsets are 64 bits on all platforms. This model never uses
 negative
 offsets, so there's no need for a signed type. (But perhaps it would
 be
 better to use one anyway?) BlockLength should be something appropriate
 to
 the architecture's address space.
 
  type FilePos = Word64
  type BlockLength = Int
 
 type FilePos = Integer 
 type BlockLength = Integer 
 
  data File   -- abstract

I would prefer: 

data ImmutableStore   -- abstract
data MutableStore -- abstract

A note about buffering:
Actually, current UNIX kernels do not support non-blocking descriptors;
they support non-blocking open files. Furthermore, many programs will
break if they encounter non-blocking mode. This means that you must not
[change blocking mode] for a descriptor inherited from another program.
See http://cr.yp.to/lib/io.html.

 A value of type InputStream or OutputStream represents an input or
 output
 stream: that is, an octet source or sink. Two InputStreams or
 OutputStreams compare equal iff reading/writing one also reads/writes
 the
 other.

 (Should I call these ports instead of streams? How about
 OctetSource
 and OctetSink?)

  data InputStream-- abstract
  data OutputStream   -- abstract

Use

data OctetSource-- abstract  (or BitSource, s.a.)
data OctetSink  -- abstract  (or BitSink, s.a.)

for octets (or bits/bools) and

data PacketSource   -- abstract
data PacketSink -- abstract

to send complete packets of data by the latter.

Sincerly,
--
Stefan
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Proposal for a new I/O library design

2003-07-28 Thread Simon Marlow
[ replies to [EMAIL PROTECTED] ]

On the whole, I think this is a good direction to explore.  I like the
separation of Files from Streams: indeed it would remove much of the
complication in the existing system caused by having Handles which can
be both read and written.  Also, it gives a nice way to integrate other
objects such as Sockets into the I/O system, which can also have streams
layered on top of them.

I'm concerned about one implementation difficulty.  Your File type is
independent of the filesystem.  That is, on Unix it corresponds to an
inode.  Creating a File must correspond to opening it (in Unix speak).
Creating a stream corresponds to duplicating the file descriptor (you
could probably avoid too many unnecessary dups by being clever).
There's a potential implementation difficulty, though:
lookupFileByPathname must open the file, without knowing whether the
file will be used for reading or writing in the future.  So I would
suggest that operations which create a value of type File take a
read/write flag too.

  type FilePos = Word64
  type BlockLength = Int

FilePos should be Integer.

  fCheckRead  :: File - FilePos - BlockLength - IO Bool
  fCheckWrite :: File - FilePos - BlockLength - IO Bool

What do these do?  If they're supposed to return True if the required
data can be read/written without blocking, then I suspect that they are
not useful.

 Fundamental operations on streams. Maybe Octet is supposed 
 to represent
 Octet or EOS, though I'm not sure this is enough for proper EOS
 handling.

I'd use the traditional 'isEOF' way of detecting end of file.

On naming: it's probably not a good idea to use the 'is' prefix, since
it is already used for predicates (meaning literally 'is' rather than an
abbreviation for 'InputStream').

  isGet  :: InputStream - IO (Maybe Octet)
  isPeek :: InputStream - IO (Maybe Octet)
  isGetBlock :: InputStream - BlockLength - XXX - IO BlockLength
  -- efficiency hack
 
  osPut  :: OutputStream - Octet - IO ()
  osPuts :: OutputStream - [Octet] - IO ()
  osPutBlock :: OutputStream - BlockLength - XXX - IO ()
  osFlush:: OutputStream - IO ()

You need operations to control buffering, too.  Something like
h{Set,Get}Buffering would be fine.

You will also want a way to get back from an InputStream to the
underlying object, eg. the (File,FilePos) pair if one exists.

It's not pretty, but you certainly want a way to close a stream.
Finalizers aren't reliable enough.

How did you intend text encodings to work?  I see several possibilities:

   textDecode :: TextEncoding - [Octet] - [Char]

or
  
   decodeInputStream :: TextEncoding - InputStream - TextInputStream
   getChar :: TextInputStream - IO Char
   etc.

or
  
   setInputStreamCoding :: InputStream - TextEncoding - IO ()
   getChar :: InputStream - IO Char

The first one is nice, but hard to optimise, and it will get complicated
for encodings which have state.  The second one is probably the best
compromise.

  data Directory  -- abstract

I don't see a reason for changing the existing Directory support
(System.Directory).  Could you give some motivation here?  Is the idea
to abstract away from the syntax of pathnames on the platform (eg.
directory separator characters)?  If so, I'm not sure it's worthwhile.
There are lots of differences between pathname conventions: case
sensitivity, arbitrary limits on the lengh of filenames, filename
extensions, and so on.

 Convenient shortcuts for common cases.
 
  lookupFileByPathname :: String - IO File

Here, I suggest we need

  lookupFileByPathname :: FilePath - IOMode - IO File

  lookupInputStreamByPathname :: String - IO InputStream
  -- at least as likely to succeed as lookupFileByPathname

and similarly

  createFileOutputStream :: FilePath - IO OutputStream
  appendFile :: FilePath - IO OutputStream

Cheers,
Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


System.Directory (was RE: Proposal for a new I/O library design)

2003-07-28 Thread Hal Daume
Hi guys,

I'm not replying to anything in the message, but...

 Is the idea
 to abstract away from the syntax of pathnames on the platform (eg.
 directory separator characters)?  If so, I'm not sure it's worthwhile.
 There are lots of differences between pathname conventions: case
 sensitivity, arbitrary limits on the lengh of filenames, filename
 extensions, and so on.

Would there be any way to get some of these differences into the
System.Directory structure?  At least the following would be nice:

 pathSeparator :: Char
 '\\' on Windows, '/' on unices, ':' (I believe) on macs, etc...

 directorySeparator :: Char
 ';' on Windows, ':' on unices, i have no idea on macs

 isCaseSensitive :: Bool
 False on Windows, True on (all?) unices, i have no idea on macs

given just these, i think we'd all be a lot happier.  I also don't
particularly care whether these are IO operations or just values (so
long as they are constant, they might as well be values with
unsafePerformIO wrapped around them if necessary).

My current approach to figuring this out is to create a directory,
change to that directory, get the current path name and try to parse it.
This is bad for so many reasons I won't enumerate them here.

...unless this stuff is hiding somewhere else, please let me know (but
System.Directory would probably be a good place for it to end up)...

 - Hal
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Proposal for a new I/O library design

2003-07-28 Thread Tim Sweeney
Ben,

I live in a different universe, but over here I prefer to represent files
purely as memory-mapped objects.  In this view, there is no difference
between a read-only file and an immutable array of bytes (a byte being a
natural number between 0 to 255).  A read-write file is then equivalant to a
mutable array (or a reference to a mutable array on a heap) of the same.
Treating these all as heap references tends to be cleaner, because you can
compare the references for equality, which is significant even for read-only
files, because two files which contain the same exact data are not
necessarily the same file, whereas opening the same file in two different
places should result in equal references.

This approach greatly simplifies lots of things now that all modern
operating systems can perform file-mapping efficiently with the virtual
memory subsystem paging pieces in and out as necessary.  It gets rid of
pieces of information which are redundent from the low-level file system
point of view (the file handle itself, the current file pointer, etc).

The typical C/Unix approach is to deal with network (TCP or UDP) connections
as streams, too.  Obviously, memory-mapped files aren't a good way of
exposing them -- doing so would require buffering all past data, as well as
blocking when waiting on yet-unreceived data (when you really want to be
able to query whether there is incoming data available).  Instead, I prefer
to conceptualize network connections as a socket / packet-based interface,
with functions to open/close sockets, send a complete packet (being an array
of bytes) to a socket, receive a packet from a socket, and query packet
availability.  With this approach, there is no redundency or missing
information; everything that is observable from the protocol point of view
is an observable in the language interface, and nothing more.

In this manner, it's possible to get rid of all remnants of Unix-like
streams from a language's IO interface.

-Tim

- Original Message -
From: Ben Rudiak-Gould [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, July 27, 2003 11:35 PM
Subject: Proposal for a new I/O library design


 The other day I was reading the Haskell i18n debate in the list archives,
 and started thinking about possible replacements for the existing Haskell
 file I/O model.

 It occurred to me that the Haskell community has really dropped the ball
 on this one. Haskell's design has always emphasized doing the right thing,
 not merely doing the thing that everyone else happens to be doing. It's
 that philosophy that led to the invention of the monadic I/O model, among
 other things. And yet, what do we choose for our I/O primitives? The same
 old crocks that everyone else was using. We open and close files (whatever
 that's supposed to mean); we expose file handles to the user; we even
 maintain a current position in the file, which is an unnecessary global
 state variable if I've ever seen one.

 The proposal below is the result of a few hours spent thinking about how
 the file system would be accessed if it were actually implemented in
 Haskell, instead of behind a weird C API. I'm very interested in hearing
 comments and criticism. In particular, I want to know if there's enough
 interest in this model that I should actually try to implement it.

 The most important idea in this design as far as i18n is concerned is the
 separation of random-access files from input and output streams. Most of
 the ugliness of the usual file I/O interface comes from conflating these
 three concepts, which are almost totally unrelated. In particular, there's
 no need in this model to worry about the meaning of reading or seeking in
 a text file. Text encoding and decoding apply to streams, not files. To
 read text from a file you layer an input stream on it, apply a text parser
 to that, and read characters. If you need to seek to a new location, you
 create a new stream which starts at that location in the underlying file.


  module System.ProposedNewIOModel (...) where

 I assume that all I/O occurs in terms of octets. I think that this holds
 true of every platform on which Haskell is implemented or is likely to be
 implemented.

  type Octet = Word8

 File offsets are 64 bits on all platforms. This model never uses negative
 offsets, so there's no need for a signed type. (But perhaps it would be
 better to use one anyway?) BlockLength should be something appropriate to
 the architecture's address space.

  type FilePos = Word64
  type BlockLength = Int

 A value of type File represents a file, which is essentially a resizable
 strict array of octets. Two values of type File compare equal if they are
 the same file -- that is, if they have the same contents and changes to
 one also appear in the other.

 (File is a bad name for this. For one thing, NTFS and HFS can associate
 more than one chunk of data with each directory entry, and file usually
 refers to all the chunks together. Fork would be more 

Re: Proposal for a new I/O library design

2003-07-28 Thread Sven Panne
Tim Sweeney wrote:
I live in a different universe, but over here I prefer to represent files
purely as memory-mapped objects.  [...]
I'd prefer official support for this in Haskell, too. The need for this
and other I/O-related stuff was recognized by the Java community, too: It
resulted in the java.nio package family, see e.g.
   http://java.sun.com/j2se/1.4.2/docs/api/

I suggest we should have a look at this before re-inventing the wheel. Not
that all Java libraries are perfect in their first incarnation, far from it,
see e.g. java.awt, java.net.URL, java.io, ...  :-P  But the second try
(javax.swing, java.net.URI, java.nio, ...) is often quite usable, so we can
probably learn some useful concepts from the latter.
Cheers,
   S.
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Proposal for a new I/O library design

2003-07-28 Thread Tomasz Zielonka
On Mon, Jul 28, 2003 at 12:56:04PM -0500, Tim Sweeney wrote:
 Ben,
 
 I live in a different universe, but over here I prefer to represent files
 purely as memory-mapped objects.  In this view, there is no difference
 between a read-only file and an immutable array of bytes (a byte being a
 natural number between 0 to 255).  A read-write file is then equivalant to a
 mutable array (or a reference to a mutable array on a heap) of the same.
 Treating these all as heap references tends to be cleaner, because you can
 compare the references for equality, which is significant even for read-only
 files, because two files which contain the same exact data are not
 necessarily the same file, whereas opening the same file in two different
 places should result in equal references.

 [...]
 
 In this manner, it's possible to get rid of all remnants of Unix-like
 streams from a language's IO interface.

You certainly can't always mmap the whole file into memory at once (on a
32-bit architecture at least), because:
1) there are files that won't fit into 32-bit address space
2) ... and usually you don't have the whole address space for you. I
mean there are mmaped libraries, stack, allocated memory, etc. so the
address space can be somewhat fragmented.
3) after mmaping many big files (each fitting into 32-bit address space)
you can run out of address space

I've been bitten by all this problems and now I mmap my files in parts,
mapping and unmapping them as needed. It can be done, but it is no
longer that simple.

 -Tim

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: System.Directory (was RE: Proposal for a new I/O library design)

2003-07-28 Thread Wolfgang Thaller
pathSeparator :: Char
'\\' on Windows, '/' on unices, ':' (I believe) on macs, etc...
Used to be ':' in Classic MacOS, and there are still some old routines 
in Apple's Carbon library that take ':'-separated paths. However, Apple 
always insisted that Pathnames should only be used for display 
purposes, mostly because a pathname did not always uniquely identify a 
file (!).
Mac OS X uses normal unix-style paths for everything that concerns 
us. Also, the Classic Mac-style paths had different semantics (no . 
or .., etc.), so treating them would have been more difficult than 
just using a different separator. Fortunately, they're a thing of the 
past.

directorySeparator :: Char
';' on Windows, ':' on unices, i have no idea on macs
As there was no command line, no PATH, and no textual config files on 
classic Mac OS, there is no mac-specific directory separator, so it's 
':'.
 (although I think that stealing yet another character from


isCaseSensitive :: Bool
False on Windows, True on (all?) unices, i have no idea on macs
It's not that easy. Case sensitivity is a property of a file system, 
not of the operating system.
So if you mount a Windows or Mac OS volume on a Linux system, the 
filenames on that volume will still be case-insensitive (but 
case-preserving).
On Mac OS X, the default file system type is HFS+ (a.k.a. Mac OS 
Extended), which is case insensitive, but you can also choose UFS (case 
sensitive). You could also have both, on two partitions.
Of course, False for Windows and MacOS, True for everything else is 
a reasonable guess, but you can't rely on it.
Right now I have no idea how to implement something like

isCaseSensitive :: FilePath - IO Bool

which would determine whether a specific path would be case sensitive. 
Maybe it's worth the effort to think about it.

given just these, i think we'd all be a lot happier.
I agree. If the last one can be implemented properly, that is.

Cheers,

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Proposal for a new I/O library design

2003-07-28 Thread Sven Panne
Tomasz Zielonka wrote:
You certainly can't always mmap the whole file into memory at once (on a
32-bit architecture at least), because: [...]
I think all these issues are handled by java.nio.Buffer and friends. Are
there any people on this list with real-world war stories about java.nio?
So far I had a look at those packages from an implementation perspective only...
Cheers,
   S.
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: System.Directory (was RE: Proposal for a new I/O library design)

2003-07-28 Thread John Meacham
On Mon, Jul 28, 2003 at 07:51:51PM +0200, Wolfgang Thaller wrote:
 isCaseSensitive :: Bool
 False on Windows, True on (all?) unices, i have no idea on macs
 
 It's not that easy. Case sensitivity is a property of a file system, 
 not of the operating system.
 So if you mount a Windows or Mac OS volume on a Linux system, the 
 filenames on that volume will still be case-insensitive (but 
 case-preserving).
 On Mac OS X, the default file system type is HFS+ (a.k.a. Mac OS 
 Extended), which is case insensitive, but you can also choose UFS (case 
 sensitive). You could also have both, on two partitions.
 Of course, False for Windows and MacOS, True for everything else is 
 a reasonable guess, but you can't rely on it.
 Right now I have no idea how to implement something like

 isCaseSensitive :: FilePath - IO Bool

See 'statfs(2)' and 'fstatfs(2)', there should be enough info there to
implement it. plus an interface to the other information returned by
these functions would be useful.

perhaps we need a standard trinary data type, True,False,Unknown. I
guess (Maybe Bool) works.
John


-- 
---
John Meacham - California Institute of Technology, Alum. - [EMAIL PROTECTED]
---
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: System.Directory (was RE: Proposal for a new I/O library design)

2003-07-28 Thread Ben Rudiak-Gould
On Mon, 28 Jul 2003, Wolfgang Thaller wrote:

 It's not that easy. Case sensitivity is a property of a file system, 
 not of the operating system.

Actually, it's not even that easy. The NT native API allows you to specify
case sensitivity as a flag when creating or opening a file in any
directory (at least on NTFS). You can create file entries this way which
are inaccessible from the Win32 subsystem because they're shadowed by
other names in the same directory which differ only in case.

If we ignore that complication, I think the right way to handle this is
with dIsCaseSensitive :: Directory - IO Bool. Assuming, as always, that
there's a way to implement that. Or perhaps it should be Maybe Bool
instead of Bool.


 isCaseSensitive :: FilePath - IO Bool

I don't think it's clear what this should mean. Assuming you have a
case-insensitive filesystem rooted at /mnt, what should
isCaseSensitive /mnt return? The filesystem rooted there is
case-insensitive, but the pathname passed to the function is 100%
case-sensitive.

(This also has the usual problems associated with any function which uses
pathnames. See my comments on the Libraries list.)


-- Ben

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: System.Directory (was RE: Proposal for a new I/O library design)

2003-07-28 Thread Glynn Clements

Hal Daume wrote:

 Would there be any way to get some of these differences into the
 System.Directory structure?  At least the following would be nice:
 
  pathSeparator :: Char
  '\\' on Windows, '/' on unices, ':' (I believe) on macs, etc...

Either '\\' or '/' on Windows. The former is preferred, but the latter
also works in most contexts.

For Windows, there's also the issue of drive letters and network (UNC)
paths.

  isCaseSensitive :: Bool
  False on Windows, True on (all?) unices, i have no idea on macs

It's more accurate to say that most native Unix filesystems are
case-sensitive. However, many Unix systems can mount foreign
filesystems (FAT, SMB) which aren't case-sensitive.

Another significant distinction is in the handling of non-ASCII
characters. Windows treats filenames as lists of characters; VFAT and
NTFS use Unicode, while FAT filesystems may have an associated
codepage. OTOH, Unix treats filenames as lists of bytes; while
applications may impose an (arbitrary) encoding on filenames, the OS
doesn't.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell