On Thu, Jul 07, 2005 at 02:15:19PM -0600, Paul Seamons wrote: : > We should approach this from the perspective that $fh is an iterator, so : > the general problem is "how do we navigate a random-access iterator?". : : Well - I kind of thought that $fh was a filehandle that knew how to behave : like an iterator if asked to do so.
Yes, basically. And they fall into that class of iterators that may or may not know how to back up, so it may be quite possible to seek forward 10 items but not backward 10 items, if "item" is, for example, a line defined by an asymmetric match rule. : There are too many applications that : need to jump around using seek. We need to have a POSIXly correct layer, but that's no reason not to have other layers on top of that with more useful semantics. I view files as just funny-looking strings, in the abstract. So the same issues arise that we've talked about concerning strings in Unicode, and that's even before we get into counting lines or paragraphs. Like a string, a file may naturally allow itself to be viewed as bytes (POSIX), codepoints, graphemes, and/or characters in the current language. It can allow multiple views into the same abstract string, but as with strings, it may limit the minimum and maximum abstraction level you're allowed to deal with the file. And depending on the file/string representation, one of the abstraction levels is likely to be very efficient to seek around in, and others have to be emulated by visiting all the intermediate items. Some file structures are great at indexing into lines but lousy at indexing into anything smaller than that. A file position in such a file is not even going to be an integer, but a line number plus an offset into the line. I realize we most of us come from the POSIXly-correct worldview that all files are really just sequence of bytes that can always be indexed by integer. This view doesn't make a lot of sense any more in the world of Unicode. We see various versions of Unix/Linux being caught with their pants down because there's no metadata to tell you the character encoding of the filenames, for instance. Perl 6 must not fall into that trap. In the discussion of seek(), this primarily means that you must keep reminding yourself that file positions (and string positions) are not necessarily numbers. Treat them as opaque recipes for navigating into a file, because you don't know what the most efficient underlying representation is. It might even be some kind of URI. At the same time, all relative navigation *must* specify the units. We can't simply assume bytes any more. And if you specify navigation in a smaller unit than the natural unit of the file/string in question, you have to either give it a round-up or round-down instruction, or be prepared to handle an exception of some sort. A UTF-8 handler has the nice property that it can tell if it has landed in the middle of a character, but it can't read your mind about what to do when that happens. : The options that need to be there are: : seek from the beginning : seek from the end : seek from the current location : : Now it could be simplified a bit to the following cases: : : $fh.seek(10); # from the beginning forward 10 : $fh.seek(-10); # from the end backwards 10 Apart from the units and allignment problem, does $fh.seek(-0) mean the beginning or the end of the file? : $fh.seek(10, :relative); # from the current location forward 10 : $fh.seek(-10, :relative); # from the current location backward 10 Again, 10 whats? Bytes? Codepoints? Lines? I think I'd actually like to divorce the notion of going to a particular position from the notion of relative navigation. So I'm in favor of $fh.seek taking *only* an opaque position, and $fh.beg and $fh.cur and $fh.end returning opaque positions. Then there are navigation commands that can take an opaque position and move relative to them a given number of units, and we force the units to be specified. Something like: $fh.pos = $fh.pos + 10`lines Arguably, we could probably admit $fh.pos = 10`bytes for the case of seeking from the begining. But I'd kind of like $fh.pos = 10 to be considered an error. Note also that we can treat string positions exactly the same way. All the rule-ishly returned positions are defined as opaque objects already. Larry