Re: Should a dirhandle be a filehandle-like iterator?
On Tue, May 01, 2007 at 10:00:00AM +0100, Smylers wrote: : That'll make it easy for people porting PHP scripts to Perl 6 -- in : particular for those wanting to port the security hole where a CGI : parameter is used to form part of a filename opened by a script but a : malicious user can supply a URL instead and cause the program to do : things very different from what it intended. PHP's security hole is that it treats tainting as NIH. Putting http: on the front of a filename is only one of several ways to attack open, and open is far from the only spot vulnerable to injection attacks. Larry
Re: Should a dirhandle be a filehandle-like iterator?
On 5/1/07, Smylers <[EMAIL PROTECTED]> wrote: What are the situations in which a programmer really needs to open something but doesn't know wether that thing is a file, a directory, or a URL? I'm still unpersuaded this is sensible default behaviour. Lots of times. It's an agnosticism, meaning that you can write a module which opens things and it doesn't have to know what it's opening; as a matter of fact, it doesn't even have to know what kinds of things it's *capable* of opening. That's powerful. The point is, even though opening a file and opening a URL are reasonably different things, they are both the same logical operation, so they can be abstracted. Not abstracting them will either cause modules to have to take a "type of file" parameter whenever they take a file, or will lead to code like this: my $fh = do given $file { when // { openurl($file) } default { open($file) } } And the programmer of this module may not have been aware that he could operate on directories, even though it's just some sort of line processing module. Anyway, my point is that the concept of opening something is abstractable, and not abstracting it means that everyone has to abstract it separately. And please don't argue from the standpoint of security holes. Security holes are possible in every language which talks to the outside world. Taint mode is a pretty good way to keep security holes out of a complex, rich language like Perl. But if you really want to be free of security holes, you either have to make a language which doesn't do anything, or make a language in which everyone can be an expert within an hour. Luke
Re: Should a dirhandle be a filehandle-like iterator?
John Macdonald writes: > open(:file), open(:dir), open(:url), ... could be the non-dwimmy > versions. If you don't specify an explicit non-dwimmy base variant, > the dwim magic makes a (preferrably appropriate) choice. That'll make it easy for people porting PHP scripts to Perl 6 -- in particular for those wanting to port the security hole where a CGI parameter is used to form part of a filename opened by a script but a malicious user can supply a URL instead and cause the program to do things very different from what it intended. What are the situations in which a programmer really needs to open something but doesn't know wether that thing is a file, a directory, or a URL? I'm still unpersuaded this is sensible default behaviour. Smylers [Apologies for the delay on this; I first tried to send it on April 15th, and only just spotted it failed to get through.]
Re: Should a dirhandle be a filehandle-like iterator?
HaloO, Jonathan Lang wrote: But then, a file handle doesn't behave exactly like "standard in" or "standard out", either (last I checked, Perl 5 won't do anything useful if you say "seek STDIN, 0, SEEK_END"). How should Perl 6 behave? I guess it's possible to return a lazy list that captures STDIN from the time of call onwards. Regards, TSa. --
Re: Should a dirhandle be a filehandle-like iterator?
HaloO, Uri Guttman wrote: [..] if dirs mapped well onto file handles they would have been mapped that way long ago in the OS. in 30+ years that hasn't happened afaik. Hans Reiser is promoting just the unification of files and directories in his Reiser4 filesystem. In particular does it support opening a file as a directory for accessing the file's metadata. Also think of files that have internal structure like tar files and libraries. With the matching IO plugin one can access these files with the same API that is used for unix style files/directories. In that sense it is a good idea to provide a fatter interface that has got explicit opendir and openfile methods in addition to plain open that dwims. Regards, TSa. --
Re: Should a dirhandle be a filehandle-like iterator?
On Sun, Apr 15, 2007 at 01:16:32PM -0400, John Macdonald wrote: : On Fri, Apr 13, 2007 at 08:14:42PM -0700, Geoffrey Broadwell wrote: : > [...] -- so non-dwimmy open : > variants are a good idea to keep around. : > : > This could be as simple as 'open(:!dwim)' I guess, or whatever the : > negated boolean adverb syntax is these days : : open(:file), open(:dir), open(:url), ... could be the non-dwimmy : versions. If you don't specify an explicit non-dwimmy base : variant, the dwim magic makes a (preferrably appropriate) choice. I suspect that since we have a type system now we should probably use it for the non-dwimmy versions. my $io = IO::File.open($str) my $io = IO::Pipe.open($str) my $io = IO::Socket.open($str) my $io = IO::Dir.open($str) my $io = IO::URI.open($str) etc. And of course different kinds of objects can have different defaults. I'd guess the default for directories is to take a snapshot and sort the entries, for instance. Certainly the open is the only place we have to distinguish the type, and $io.close will close any of them. To me the interesting question is, when do we assume that a string is a filename or uri? I can argue that for historical and security reasons bare open() should always assume the provided string is a normal filename. However, given that the existence of feed operators removes most of my objections to Ingy's io() interface, we could make that default to uri processing: io('http://www.wall.org/~larry') ==> my @homepage; Though we have a bit of a semantic problem insofar as @source ==> io('file:foo') is going to want to supply more arguments to io() rather than send the feed to some method of the IO object, unless io() is some kind of a context-sensitive macro, or at least has a signature that doesn't allow slurpies. And currently feed ops are considered statement terminators, which makes it odd to think about overloading them. More of a problem is that multiple dispatch based on argument type depends on eager evaluation of dynamic types, while feeds are basically lazy. We don't know how "hard" to call io() without recognizing it as special, or specifying the actual method: @source ==> io('file:foo').print Maybe that's good enough, but it seems like we could do a little better. Hmm, type coercions tend to be unary, or at least not "listy", so maybe we can just recognize types as returning source and sink objects, which feeds automatically call with an appropriate variadic method (.lines, .print, .tap) depending on pointiness: IO('http://www.wall.org/~larry') ==> my @homepage; # implicit .lines @source ==> IO('file:foo') # implicit .print @source ==> IO($debuglog) ==> @sink # implicit .tap Larry
Re: Should a dirhandle be a filehandle-like iterator?
On Fri, Apr 13, 2007 at 08:14:42PM -0700, Geoffrey Broadwell wrote: > [...] -- so non-dwimmy open > variants are a good idea to keep around. > > This could be as simple as 'open(:!dwim)' I guess, or whatever the > negated boolean adverb syntax is these days open(:file), open(:dir), open(:url), ... could be the non-dwimmy versions. If you don't specify an explicit non-dwimmy base variant, the dwim magic makes a (preferrably appropriate) choice. --
Re: Should a dirhandle be a filehandle-like iterator?
Jonathan Lang writes: > Also: why distinguish between "open" and "opendir"? If the string is > the name of a file, 'open' means "open the file"; if it is the name of > a directory, 'open' means "open the directory". Many programs open a file from a name specified by the user. Even if C existed, many programmers would surely continue to use C for this. Users being able to trick such programs into opening a directory rather than a file could be unpleasant. Smylers
Re: Should a dirhandle be a filehandle-like iterator?
On Fri, Apr 13, 2007 at 07:43:23PM -0500, brian d foy wrote: > As I was playing around with dirhandles, I thought "What if..." (which > is actualy sorta fun to do in Pugs, where Perl 5 has everything > documented somewhere even if nobody has read it). > > My goal is modest: explain fewer things in the Llama. If dirhandles > were like filehandles, there's a couple of pages of explanation I don't > need to go through. > > Witness: > > I can iterate through the elements of a named array with [EMAIL PROTECTED]: > >my @a = < 1 2 3 4 5 >; >for [EMAIL PROTECTED] { .say } # but not =< 1 2 3 4 5 > :( > > and I can read lines from a file: > >for =$fh { .say } > > Should I be able to go through a directory handle that way too? A "yes" > answer would be very pleasing :) > >my $dh = "doc".opendir; >for =$dh { .say }# doesn't work in pugs > > And, since we're using objects now, .closedir can really just be > .close, right? > > And, maybe this has been already done, but wrapping a lazy filter > around anything that can return items. I'm not proposing this as a > language feature, but if many things shared the same way of getting the > next item, perhaps I could wrap it in a lazy map-ish thingy: > >my $general_iterator = lazy_mappish_thingy( "doc".opendir ); > >for =$general_iterator { .say } > >$general_iterator.close; # or .end, or .whatever > > That last part is definetely not Llama material, but maybe I'll at > least hit the haystack. One of the things done for Perl 5.10 is to make dirhandles be a little bit more like filehandles. On OS's that allow it, things like stat DIRHANDLE -X DIRHANDLE chdir DIRHANDLE all make sense and do what you'd think they'd do. Steve Peters [EMAIL PROTECTED]
Re: Should a dirhandle be a filehandle-like iterator?
Why bother, actually, when it can just be a lazy list... Opendir and closedir are very oldschool, and can be retained for whatever technical detail they are needed, but in most modern code I think that: for readdir($dir_name) { .say } should work as well. The act of opening a directory is something I never quite got... Even a directory with millions of entries is still peanuts in todays memory sizes, and if it does need to be iterated very carefully the old variants can still be around. readdir() returning a list doesn't have to be inefficient but it's easier to screw up with it and make it bloat. -- Yuval Kogman <[EMAIL PROTECTED]> http://nothingmuch.woobling.org 0xEBD27418 pgpNgghKAUJR3.pgp Description: PGP signature
Re: Should a dirhandle be a filehandle-like iterator?
> "JL" == Jonathan Lang <[EMAIL PROTECTED]> writes: JL> Well, I did suggest that "openfile" and "opendir" exist alongside JL> "open", with "openfile" being more akin to Perl 5's "open" or JL> "sysopen", and "open" being a bit more dwimmy. JL> But in general, most of the differences that you mention are things JL> that ought to be addressed in the resulting iterators, not in the JL> creating statement. No, a "directory handle" will not behave exactly JL> like a "file handle". But then, a file handle doesn't behave exactly JL> like "standard in" or "standard out", either (last I checked, Perl 5 JL> won't do anything useful if you say "seek STDIN, 0, SEEK_END"). well, that seek failure is a result of the stream nature of stdin and not a failure of perl. remember that open and much of the i/o layers (regardless of perl I/O's rewrite) are just wrappers around the OS and libc calls. i don't see how to dwim them all together (but IO::All does that in a wacky dwim way). i have never felt the need for super smart iterators so i can change looping over lines to looping over a dir. maybe you might have a set of filenames in file vs a dir of names. but i just don't run into that need. sometimes mappings like that are just overkill IMO. enough from me on this. as with the rest of p6 i will work with whatever is decided by @larry. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Should a dirhandle be a filehandle-like iterator?
Uri Guttman wrote: > "JL" == Jonathan Lang <[EMAIL PROTECTED]> writes: JL> Please. I've always found the "opendir ... readdir ... closedir" set JL> to be clunky. JL> Also: why distinguish between "open" and "opendir"? If the string is JL> the name of a file, 'open' means "open the file"; if it is the name of JL> a directory, 'open' means "open the directory". If it's the name of a JL> pipe, it opens the pipe. And so on. maybe this won't help you but if you did open on a dir in perl5 you can read the raw directory data which is pretty useless in most cases. so with open working as opendir on directories, what is the op/method to get the next directory entry? that isn't the same as reading a line. there won't be any trailing newlines to chomp. marking a location is not the same with tell and telldir (one is a byte offset, the other a directory entry index). and since dirs can reorder their entries (especially hash based dirs) the ordering and seek points may move. not gonna happen on text files. there are many differences and the only one you seem to see is a linear scan of them (which is just the most common access style). Well, I did suggest that "openfile" and "opendir" exist alongside "open", with "openfile" being more akin to Perl 5's "open" or "sysopen", and "open" being a bit more dwimmy. But in general, most of the differences that you mention are things that ought to be addressed in the resulting iterators, not in the creating statement. No, a "directory handle" will not behave exactly like a "file handle". But then, a file handle doesn't behave exactly like "standard in" or "standard out", either (last I checked, Perl 5 won't do anything useful if you say "seek STDIN, 0, SEEK_END"). -- Jonathan "Dataweaver" Lang
Re: Should a dirhandle be a filehandle-like iterator?
> "JL" == Jonathan Lang <[EMAIL PROTECTED]> writes: JL> Please. I've always found the "opendir ... readdir ... closedir" set JL> to be clunky. JL> Also: why distinguish between "open" and "opendir"? If the string is JL> the name of a file, 'open' means "open the file"; if it is the name of JL> a directory, 'open' means "open the directory". If it's the name of a JL> pipe, it opens the pipe. And so on. maybe this won't help you but if you did open on a dir in perl5 you can read the raw directory data which is pretty useless in most cases. so with open working as opendir on directories, what is the op/method to get the next directory entry? that isn't the same as reading a line. there won't be any trailing newlines to chomp. marking a location is not the same with tell and telldir (one is a byte offset, the other a directory entry index). and since dirs can reorder their entries (especially hash based dirs) the ordering and seek points may move. not gonna happen on text files. there are many differences and the only one you seem to see is a linear scan of them (which is just the most common access style). the operations you can do on the handles are very different as well. you can't write to a dir. dirs have no random access (you can lookup by a name with open but you can't go to the nth entry). and on OS with extra stuff like version numbers, then all bets are off. yes, you can tell the dir is such by doing a stat and then open can dwim but i don't see the overlap as you do. dirs generally are ordered lists of strings and have many different underlying formats based on their file systems. mapping that to a text file of lines doesn't work for me. this may all be obvious stuff but i think it deserves mentioning. if dirs mapped well onto file handles they would have been mapped that way long ago in the OS. in 30+ years that hasn't happened afaik. uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: Should a dirhandle be a filehandle-like iterator?
Geoffrey Broadwell wrote: Jonathan Lang wrote: > Also: why distinguish between "open" and "opendir"? If the string is > the name of a file, 'open' means "open the file"; if it is the name of > a directory, 'open' means "open the directory". If it's the name of a > pipe, it opens the pipe. And so on. As long as you still have some way to reach the low-level opens -- though it's an odd thing to do (except perhaps in a disk integrity checker), there's no fundamental reason why you shouldn't be able to actually look at the bytes that happen to represent a directory structure on disk. It wouldn't be hard to allow .openfile, .opendir, and .openpipe as well as .open. -- Jonathan "Dataweaver" Lang
Re: Should a dirhandle be a filehandle-like iterator?
On Fri, 2007-04-13 at 19:00 -0700, Jonathan Lang wrote: > Please. I've always found the "opendir ... readdir ... closedir" set > to be clunky. > > Also: why distinguish between "open" and "opendir"? If the string is > the name of a file, 'open' means "open the file"; if it is the name of > a directory, 'open' means "open the directory". If it's the name of a > pipe, it opens the pipe. And so on. As long as you still have some way to reach the low-level opens -- though it's an odd thing to do (except perhaps in a disk integrity checker), there's no fundamental reason why you shouldn't be able to actually look at the bytes that happen to represent a directory structure on disk. Also, for security or correctness reasons you may want to make sure that you don't clobber things you don't mean to -- so non-dwimmy open variants are a good idea to keep around. This could be as simple as 'open(:!dwim)' I guess, or whatever the negated boolean adverb syntax is these days -'f
Re: Should a dirhandle be a filehandle-like iterator?
brian d foy wrote: As I was playing around with dirhandles, I thought "What if..." (which is actualy sorta fun to do in Pugs, where Perl 5 has everything documented somewhere even if nobody has read it). My goal is modest: explain fewer things in the Llama. If dirhandles were like filehandles, there's a couple of pages of explanation I don't need to go through. Witness: I can iterate through the elements of a named array with [EMAIL PROTECTED]: my @a = < 1 2 3 4 5 >; for [EMAIL PROTECTED] { .say } # but not =< 1 2 3 4 5 > :( and I can read lines from a file: for =$fh { .say } Should I be able to go through a directory handle that way too? A "yes" answer would be very pleasing :) my $dh = "doc".opendir; for =$dh { .say }# doesn't work in pugs And, since we're using objects now, .closedir can really just be .close, right? Please. I've always found the "opendir ... readdir ... closedir" set to be clunky. Also: why distinguish between "open" and "opendir"? If the string is the name of a file, 'open' means "open the file"; if it is the name of a directory, 'open' means "open the directory". If it's the name of a pipe, it opens the pipe. And so on. Note that the above could be further shorthanded, as long as you don't need the directory handle after the loop: for ="doc".open { .say } -- Jonathan "Dataweaver" Lang
Should a dirhandle be a filehandle-like iterator?
As I was playing around with dirhandles, I thought "What if..." (which is actualy sorta fun to do in Pugs, where Perl 5 has everything documented somewhere even if nobody has read it). My goal is modest: explain fewer things in the Llama. If dirhandles were like filehandles, there's a couple of pages of explanation I don't need to go through. Witness: I can iterate through the elements of a named array with [EMAIL PROTECTED]: my @a = < 1 2 3 4 5 >; for [EMAIL PROTECTED] { .say } # but not =< 1 2 3 4 5 > :( and I can read lines from a file: for =$fh { .say } Should I be able to go through a directory handle that way too? A "yes" answer would be very pleasing :) my $dh = "doc".opendir; for =$dh { .say }# doesn't work in pugs And, since we're using objects now, .closedir can really just be .close, right? And, maybe this has been already done, but wrapping a lazy filter around anything that can return items. I'm not proposing this as a language feature, but if many things shared the same way of getting the next item, perhaps I could wrap it in a lazy map-ish thingy: my $general_iterator = lazy_mappish_thingy( "doc".opendir ); for =$general_iterator { .say } $general_iterator.close; # or .end, or .whatever That last part is definetely not Llama material, but maybe I'll at least hit the haystack.