Re: IO subsystem stuff

2004-02-02 Thread Dan Sugalski
At 4:47 PM -0700 1/27/04, Cory Spencer wrote:
Perhaps someone with a bit more familiarity with the Parrot IO subsystem
could give me some guidance here.  I'm currently trying to get a new
'peek' opcode working, and I'm having difficulties getting the io_unix
layer implemented correctly.
Before we go any further, could you refresh my memory? I know what 
the peek function's supposed to do, but I'm unclear as to the why. 
(You may have explained it already, but humor me--I'm overloaded with 
mail and a bit of jetlag)

FWIW, it's worth remembering that reading data from a filehandle may 
well actually be reading data from a linked list of data manipulation 
and generation layers, and speculative reads may well not do what you 
expect. Ponder a filehandle that, for bizarre reasons we won't even 
consider, returns a 64 bit integer representing the time in 100ns 
ticks since the epoch of the read request--what do you do with a 
speculative read (with or without pushback of unwanted data) in that 
case?
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: IO subsystem stuff

2004-01-29 Thread Jeff Clites
On Jan 27, 2004, at 3:47 PM, Cory Spencer wrote:

Perhaps someone with a bit more familiarity with the Parrot IO 
subsystem
could give me some guidance here.  I'm currently trying to get a new
'peek' opcode working, and I'm having difficulties getting the io_unix
layer implemented correctly.

As far as I know, I'd get a call down into the io_unix layer when the
ParrotIO object isn't buffered.  What I want to be able to do is to
read()/fread() a character off of the io-fd filedescriptor, copy it 
into
the buffer, then ungetc() it back onto the stream.
You can't push a character back onto a Unix file descriptor. In order 
to emulate this for parrot, you'll need some storage hanging off of the 
ParrotIO structure to store the pushed back characters, and then 
munge the read methods to pull data from here before reading from the 
real descriptor, if there has been anything pushed back. This is, 
essentially what the C std. lib. buffered IO API does--the core Unix IO 
API doesn't provide this functionality. For parrot, I think that we 
should only do this for the io_buf layer (and maybe the io_stdio 
layer), which is the buffered IO layer, and already has a buffer which 
can be used for this purpose. I don't think it's appropriate for the 
io_unix layer--I see that as a direct wrapper around the Unix API.

Unfortunately, however, ungetc requires a (FILE *), while the ParrotIO 
object carries around only a raw file descriptor (I think).
Yes, the C std. lib. IO API is a wrapper on top of the core OS IO 
routines (for Unix or Windows), and we're using the core IO routines to 
implement our IO functionality, rather than going through an extra 
layer. (And io_stdio is based on the C std. lib., and I believe is 
provided so that it can be used on systems for which none of the other 
base layers is available--non-Unix and non-Windows.)

I've seen some instances where people will cast the raw descriptor to a
(FILE *)
I can't imagine where that would ever work. A (FILE *) is a pointer to 
a struct which stores various bits of data, including the actual file 
descriptor. A file descriptor is just an integer, and isn't going to be 
interpretable as a pointer to such a struct--using the core Unix IO 
API, no such struct will have been created anywhere in memory. So you 
can't get a FILE* via any sort of casting, at least not on Unix 
platforms.

however the man page for ungetc warns ominously in its BUGS
section that:
   It  is  not advisable to mix calls to input functions from
   the stdio library with low - level calls to read() for the
   file  descriptor  associated  with  the  input stream; the
   results will be undefined and very probably not  what  you
   want.
This is warning about something else. It's saying don't use the C API 
to do IO on a FILE*, and also use the Unix IO API on the descriptor 
which is is the fileno() of that FILE*. But even this you wouldn't do 
by casting--you'd either get the descriptor from the FILE* using 
fileno(), or use fdopen() to create a FILE* from a descriptor.

But in any event, we don't want to use the C std. lib. IO API inside of 
the io_unix layer.

That being said, what is the best course for buffering such characters 
at
the io_unix layer?  I apparently am not able to use the standard 
library
functions to do so (additionally, they only guarantee that you can peek
and replace a single character).
As I said above, I think we'd only want to do this for the io_buf 
layer, though others may disagree. If we do want to do it at the 
io_unix layer, then we can just copy down a bunch of code from io_buf, 
because we will be making the io_unix layer a buffered layer (with the 
difference being that the buffer would only be populated in the case of 
pushing back read items, an not during reads).

JEff



IO subsystem stuff

2004-01-28 Thread Cory Spencer

Perhaps someone with a bit more familiarity with the Parrot IO subsystem
could give me some guidance here.  I'm currently trying to get a new
'peek' opcode working, and I'm having difficulties getting the io_unix
layer implemented correctly.

As far as I know, I'd get a call down into the io_unix layer when the
ParrotIO object isn't buffered.  What I want to be able to do is to
read()/fread() a character off of the io-fd filedescriptor, copy it into
the buffer, then ungetc() it back onto the stream.  Unfortunately,
however, ungetc requires a (FILE *), while the ParrotIO object carries
around only a raw file descriptor (I think).

I've seen some instances where people will cast the raw descriptor to a
(FILE *), however the man page for ungetc warns ominously in its BUGS
section that:

   It  is  not advisable to mix calls to input functions from
   the stdio library with low - level calls to read() for the
   file  descriptor  associated  with  the  input stream; the
   results will be undefined and very probably not  what  you
   want.

Numerous segfaults would seem to confirm that this is indeed very probably
not what I want.

That being said, what is the best course for buffering such characters at
the io_unix layer?  I apparently am not able to use the standard library
functions to do so (additionally, they only guarantee that you can peek
and replace a single character).

-c