Author: allison Date: Tue Dec 5 22:04:43 2006 New Revision: 16022 Modified: trunk/docs/pdds/clip/pdd22_io.pod
Log: [pdd]: A partial revision of the I/O PDD. Modified: trunk/docs/pdds/clip/pdd22_io.pod ============================================================================== --- trunk/docs/pdds/clip/pdd22_io.pod (original) +++ trunk/docs/pdds/clip/pdd22_io.pod Tue Dec 5 22:04:43 2006 @@ -21,16 +21,221 @@ =head1 DESCRIPTION -This document defines Parrot's I/O subsystem, for both streams and -network I/O. Parrot has both synchronous and asynchronous I/O -operations. This section describes the interface, and the -L<IMPLEMENTATION> section provides more details on general -implementation questions and error handling. +=over 4 + +=item - Parrot I/O objects support both streams and network I/O. + +=item - Parrot has both synchronous and asynchronous I/O operations. + +=item - Asynchronous operations must interact safely with Parrot's other +concurrency models. + +=back + +=head1 IMPLEMENTATION + +=head2 Composition + +Currently, the Parrot I/O subsystem uses a per-interpreter stack to +provide a layer-based approach to I/O. Each layer implements a subset of +the C<ParrotIOLayerAPI> vtable. To find an I/O function, the layer stack +is searched downwards until a non-NULL function pointer is found for +that particular slot. This implementation will be replaced with a +composition model. Rather than living in a stack, the module fragments +that make up the ParrotIO class will be composed and any conflicts +resolved when the class is loaded. This strategy eliminates the need to +search a stack on each I/O call, while still allowing a "layered" +combination of functionality for different platforms. + +=head2 Concurrency Model for Asynchronous I/O + +Currently, Parrot only implements synchronous I/O operations. For the +1.0 release the asynchronous operations will be implemented separately +from the synchronous ones. There may be an implementation that uses one +variant to implement the other someday, but it's not an immediate +priority. + +Synchronous opcodes are differentiated from asynchronous opcodes by the +presence of a callback argument in the asynchronous calls. Asynchronous +calls that don't supply callbacks (perhaps if the user wants to manually +check later if the operation succeded) are enough of a fringe case that +they don't need opcodes. They can access the functionality via methods +on ParrotIO objects. + +Asynchronous operations don't use Parrot threads, they use a +light-weight concurrency model for asynchronous operations. The +asynchronous I/O implementation will use the composition model to +allow some platforms to take advantage of their built-in asynchronous +operations instead of using Parrot's concurrency implementation. + +[Type up review of options for the I/O concurrency model.] + +Communication between the calling code and the asynchronous operation +thread will be handled by a shared status object. The operation thread +will update the status object whenever the status changes, and the +calling code can check the status object at any time. The status object +contains a reference to the returned result of an asynchronous I/O call. + + + +=head2 I/O PMC API + +Methods + +[Over and over again throughout this section, I keep wanting an API that +isn't possible with current low-level PMCs. This could mean that +low-level PMCs need a good bit of work to gain the same argument passing +capabilities as higher-level Parrot objects (which is true, long-term). +It could mean that Parrot I/O objects would be better off defined in a +higher-level syntax, with embedded C (via NCI, or a lighter-weight +embedding mechanism) for those pieces that really are direct C access. +Or, it could mean that I'll come back and rip this interface down to a +bare minimum.] + +=over 4 + +=item new + + $P0 = new ParrotIO + +Creates a new I/O stream object. [Note that this is usually performed +via the C<open> opcode.] + +=item open + + $P0.open() + $P0.open($S1) + $P0.open($S1, $S2) + +Opens a stream on an existing I/O stream object. With no arguments, it +can be used to reopen a previously opened I/O stream. $S1 is a file path +and $S2 is an optional mode for the stream (read, write, read/write, +etc), using the same format as the C<open> opcode. + +I'm very tempted by named parameters for 'open': + + path - The path to the file + read - A flag for read mode + write - A flag for write mode (both read and write means read/write), create a new file if it doesn't exist + append - Start writing at the end of the file, or create a new file if it doesn't exist + pipe - A flag for pipe mode + + $P0.open('path'=>'/tmp/file') # Default is read-only + $P0.open('path'=>'/tmp/file', 'write'=>1) # write-only + +It would make for some rather verbose C<open> operations, though +certainly more readable, and probably just as easy to generate. + +=item close + + $P0.close() + $P0.close($P1) + +Closes an I/O stream, but leaves destruction of the I/O object to the GC. + +The asynchronous version takes an additional final PMC callback argument +$P1. When the close operation is complete, it invokes the callback, +passing it a status object. [There's not really much advantage in this +over just leaving the object for the GC to clean-up, but it does give +you the option of executing an action when the stream has been closed.] + +=item print + + $P0.print($I1) + $P0.print($N1) + $P0.print($S1) + $P0.print($P1) + $P0.print($I1, $P2) + $P0.print($N1, $P2) + $P0.print($S1, $P2) + $P0.print($P1, $P2) + +Writes an integer, float, string, or PMC value to an I/O stream object. + +The asynchronous version takes an additional final PMC callback +argument $P2. When the print operation is complete, it invokes the callback, +passing it a status object. + +=item read + + $S0 = $P1.read($I2) + $P0 = $P1.read($I2, $P3) + +Retrieves a specified number of bytes $I2, from a stream $P1 into a +string $S0. By default it reads in bytes, but the ParrotIO object can be +configured to read in code points instead. + +The asynchronous version takes an additional final PMC callback argument +$P3, and only returns a status object $P0. When the read operation is +complete, it invokes the callback, passing it a status object and a +string of bytes. + +=item readline + + $S0 = $P1.readline() + $P0 = $P1.readline($P2) + +Retrieves a single line from a stream $P1 into a string $S1. Calling +C<readline> flags the stream as operating in line-buffer mode (see the +C<buffer_type> method below). + +The asynchronous version takes an additional final PMC callback argument +$P2, and only returns a status object $P0. When the readline operation +is complete, it invokes the callback, passing it a status object and a +string of bytes. + +=item record_separator + + $S0 = $P1.record_separator() + $P0.record_separator($S1) + +Accessor (get and set) for the I/O stream's record separator attribute. + +=item buffer_type + + $I0 = $P1.buffer_type() + $S0 = $P1.buffer_type() + $P0.buffer_type($I1) + $P0.buffer_type($S1) + +Accessor (get and set) for the I/O stream's buffer type attribute. The +attribute is returned as an integer value of one of the following +constants, or a string value of 'unbuffered', 'line-buffered', or +'full-buffered'. + + 0 PIOCTL_NONBUF + Unbuffered I/O. Bytes are sent as soon as possible. + 1 PIOCTL_LINEBUF + Line buffered I/O. Bytes are sent when a newline is + encountered. + 2 PIOCTL_FULLBUF + Fully buffered I/O. Bytes are sent when the buffer is full. + [Note, the constant was called "BLKBUF" because bytes are + sent as a block, but line buffering also sends them as a + block, so changed to "FULLBUF".] + +=item buffer_size + + $I0 = $P1.buffer_size() + $P0.buffer_size($I1) + +Accessor (get and set) for the I/O stream's buffer size attribute. + +=item get_fd + + $I0 = $P1.'get_fd'() + +Retrieves the UNIX integer file descriptor of a stream object. No +asynchronous version. + +=back + +=head2 I/O Opcodes The signatures for the asynchronous operations are nearly identical to the synchronous operations, but the asynchronous operations take an additional argument for a callback, and the only return value from the -asynchronous operations is a status object. When the callbacks invoked, +asynchronous operations is a status object. When the callbacks are invoked, they are passed the status object as their sole argument. Any return values from the operation are stored within the status object. @@ -45,21 +250,17 @@ =over 4 -=item * +=item open + + $P0 = open $S1 + $P0 = open $S1, $S2 -C<open> opens a stream object based on a string path. It takes an -optional string argument specifying the mode of the stream (read, write, +Opens a stream object based on a file path in $S1 in read/write mode. The +optional string argument $S2 specifies the mode of the stream (read, write, append, read/write, etc.), and returns a stream object. Currently the mode of the stream is set with a string argument similar to Perl 5 -syntax, but a set of defined constants may fit better with Parrot's -general architecture. - - 0 PIOMODE_READ (default) - 1 PIOMODE_WRITE - 2 PIOMODE_APPEND - 3 PIOMODE_READWRITE - 4 PIOMODE_PIPE (read) - 5 PIOMODE_PIPEWRITE +syntax, but a language-agnostic mode string is preferable, using 'r' for +read, 'w' for write, 'a' for append, and 'p' for pipe. The asynchronous version takes a PMC callback as an additional final argument. When the open operation is complete, it invokes the callback @@ -148,6 +349,9 @@ =item * +['peek', 'seek', 'tell', and 'poll' are all candidates for moving from +opcodes to ParrotIO object methods.] + C<peek> retrieves the next byte from a stream into a string, but doesn't remove it from the stream. By default it reads from standard input, but it also takes a stream object argument for an alternate source. @@ -188,9 +392,39 @@ =item * +C<poll> polls a stream or socket object for particular types of events +(an integer flag) at a frequency set by seconds and microseconds (the +final two integer arguments). [At least, that's what the documentation +in src/io/io.c says. In actual fact, the final two arguments seem to be +setting the timeout, exactly the same as the corresponding argument to +the system version of C<poll>.] + +See the system documentation for C<poll> to see the constants for event +types and return status. + +This opcode is inherently synchronous (poll is "synchronous I/O +multiplexing"), but it can retrieve status information from a stream or +socket object whether the object is being used synchronously or +asynchronously. + +=back + +=head3 Deprecated opcodes + +=over + +=item * + +C<write> prints to standard output but it cannot select another stream. +It only accepts a PMC value to write. This is redundant with the +C<print> opcode, so it will be deprecated. + +=item * + C<getfd> retrieves the UNIX integer file descriptor of a stream object. +The opcode has been replaced by a 'get_fd' method on the ParrotIO +object. -No asynchronous version. =item * @@ -199,6 +433,9 @@ and a single integer argument for the command. It returns an integer indicating the success or failure of the command. +This opcode has been replaced with methods on the ParrotIO object, but +is kept here for reference. + The following constants are defined for the commands that C<pioctl> can execute: @@ -228,46 +465,21 @@ encountered. 2 PIOCTL_BLKBUF Fully buffered I/O. Bytes are sent when the buffer is full. - [Called "BLKBUF" because bytes are sent as a block, but line - buffering also sends them as a block, so "FULBUF" might make - more sense.] - -[This opcode may be deprecated and replaced with methods on stream -objects.] - -=item * - -C<poll> polls a stream or socket object for particular types of events -(an integer flag) at a frequency set by seconds and microseconds (the -final two integer arguments). [At least, that's what the documentation -in src/io/io.c says. In actual fact, the final two arguments seem to be -setting the timeout, exactly the same as the corresponding argument to -the system version of C<poll>.] - -See the system documentation for C<poll> to see the constants for event -types and return status. - -This opcode is inherently synchronous (poll is "synchronous I/O -multiplexing"), but it can retrieve status information from a stream or -socket object whether the object is being used synchronously or -asynchronously. - -=back - -=head3 Deprecated opcodes - -=over - -=item * - -C<write> prints to standard output but it cannot select another stream. -It only accepts a PMC value to write. This is redundant with the -C<print> opcode, so it will be deprecated. =back =head2 Filesystem Opcodes +[Okay, I'm seriously considering moving most of these to methods on the +ParrotIO object. More than that, moving them into a role that is +composed into the ParrotIO object when needed. For the ones that have +the form 'opcodename parrotIOobject, arguments', I can't see that it's +much less effort than 'parrotIOobject.methodname(arguments)' for either +manually writing PIR or generating PIR. The slowest thing about I/O is +I/O, so I can't see that we're getting much speed gain out of making +them opcodes. The ones to keep as opcodes are 'unlink', 'rmdir', and +'opendir'.] + =over 4 =item * @@ -394,6 +606,10 @@ Most of these opcodes conform to the standard UNIX interface, but the layer API allows alternate implementations for each. +[These I'm also considering moving to methods in a role for the ParrotIO +object. Keep 'socket' as an opcode, or maybe just make 'socket' an +option on creating a new ParrotIO object.] + =over 4 =item * @@ -503,41 +719,6 @@ =back -=head1 IMPLEMENTATION - -The Parrot I/O subsystem uses a per-interpreter stack to provide a -layer-based approach to I/O. Each layer implements a subset of the -C<ParrotIOLayerAPI> vtable. To find an I/O function, the layer stack is -searched downwards until a non-NULL function pointer is found for that -particular slot. [We need to look into the implementation of IO layers -for simplifications.] - -=head2 Synchronous and Asynchronous Operations - -Currently, Parrot only implements synchronous I/O operations. For the -1.0 release the asynchronous operations will be implemented separately -from the synchronous ones. [Eventually there may be an implementation -that uses one variant to implement the other, but it's not an immediate -priority.] - -Asynchronous operations don't use Parrot threads, they use a -light-weight concurrency model for asynchronous operations. The -asynchronous I/O implementation will use Parrot's I/O layer architecture -so some platforms can take advantage of their built-in asynchronous -operations instead of using Parrot's concurrency implementation. - -Communication between the calling code and the asynchronous operation -thread will be handled by a shared status object. The operation thread -will update the status object whenever the status changes, and the -calling code can check the status object at any time. The status object -contains a reference to the returned result of an asynchronous I/O call. - -Synchronous opcodes are differentiated from asynchronous opcodes by the -presence of a callback argument in the asynchronous calls. Asynchronous -calls that don't supply callbacks (perhaps if the user wants to manually -check later if the operation succeded) are enough of a fringe case that -they don't need opcodes. They can access the functionality via methods -on ParrotIO objects. =head2 Error Handling