Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
At 11:48 AM 9/3/00 +1100, Damian Conway wrote: Ever consider then having ($a, $b, $c) = FH; or @a[4,1,5] = FH; only read three lines? I think this is a superb idea, and look forward to someone's RFC'ing it. I like it too. Anyone working on the RFC? I wonder how the p526 translator will handle this. Suppose someone deliberately did ($line) = FH; # Save next line, discard rest Maybe something like { ($line, my @plugh) = FH; } -- Peter Scott Pacific Systems Design Technologies
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
At 10:52 AM 9/4/00 -0600, Nathan Torkington wrote: Peter Scott writes: ($a, $b, $c) = FH; or @a[4,1,5] = FH; only read three lines? I think this is a superb idea, and look forward to someone's RFC'ing it. Should be part of the want() context. It is. I interpreted Damian's remark to mean that it would be good if readline() took advantage of it, and that should be RFC'ed. Permit operations to discover (as does split) how many elements they're being assigned to. -- Peter Scott Pacific Systems Design Technologies
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Tom Christiansen wrote: Ever consider then having ($a, $b, $c) = FH; or @a[4,1,5] = FH; only read three lines? I mean, how many if any builtins would it make sense to make aware of this, and do something "different"? Personally, I think this would be really cool; stuff like this is what I was trying to poke at. Lots more power and flexibility. I could name lots of builtins this potentially makes sense for: ($one, $two) = grep /pat/, @data; ($k1, $k2) = keys %hash; # leave index at $k3? @a[6,5,4]= map { split ' ' } @line; ($last) = reverse @array; And then there's splice, sort, and any and every user-defined sub too. The only problem is when grep and map are used to change values on the fly...this will have to be addressed. But actually, the behavior could potentially be quite cool - maybe only the number requested back are changed. Hmmm. Seems a bit rare and unimportant -- until one observes how this would also solve the problem of people being confused by this gobbling up their handle: my($line) = FH; And a nice side effect too. As Peter says, the only problem is people that are relying on this to get the actual last line...but I suspect that's far fewer people than the ones would just want the first line and used ()'s on my out of habit. The more I think about it, this actually might make things more consistent too. For example, currently these two aren't the same: $count = @a = ($a, $b) = grep /pat/, @data; $count = ($a, $b) = @a = grep /pat/, @data; But these two are: $count = ($a, $b) = grep /pat/, @data; $count = @a = grep /pat/, @data; Which, actually, is a little weird when you really think about it. I wouldn't feel bad about "breaking" (fixing?) this, though, since this: $count = grep /pat/, @data; $count = grep /pat/, @data; Is probably how you should be getting the count anyways. Well, as Damian suggests this should probably be RFC'ed. Since I brought the whole mess up I'll do it, but I'd really appreciate if people could send me any input they want to add. It'll likely take me 1-2 weeks, though, since I have 4 other new ones to write and really have to update my existing ones. I'll post it to -io when it's finished. -Nate
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Nathan Wiger wrote: Tom Christiansen wrote: Ever consider then having ($a, $b, $c) = FH; or @a[4,1,5] = FH; only read three lines? I mean, how many if any builtins would it make sense to make aware of this, and do something "different"? Personally, I think this would be really cool; stuff like this is what I was trying to poke at. Lots more power and flexibility. I could name lots of builtins this potentially makes sense for: ($one, $two) = grep /pat/, @data; ($k1, $k2) = keys %hash; # leave index at $k3? @a[6,5,4]= map { split ' ' } @line; ($last) = reverse @array; And then there's splice, sort, and any and every user-defined sub too. The only problem is when grep and map are used to change values on the fly...this will have to be addressed. But actually, the behavior could potentially be quite cool - maybe only the number requested back are changed. Hmmm. The problem with making these builtins respect the number of return values context in want() is that, as Nate mentions, the expressions may have side-effects that are desired for the whole list. An alternative approach is to make these builtins respect lazy(), as defined by RFC 123: quote What if adding laziness to a list context was up to the programmer and passed through functions that can support it: for (lazy(grep {$h{$_}-STATE eq 'NY'} keys %h)){ $h{$_}-send_advertisement(); }; would cause a lazy list is passed to Cfor, and increment of the object's "letters_sent_total" field might break the iteration. for (grep {$h{$_}-STATE eq 'NY'} lazy(keys %h)){ $h{$_}-send_advertisement(); }; causes a lazy list to be passed to our filter function Cgrep, saving us from allocating the entire Ckeys array. CGrep is still in the default busy context, so it returns a busy array, which Cfor can iterate over safely. /quote By returning a lazy list, elements that are never used are never calculated. That way the programmer could decide whether or not they want the Perl 5 list-gobbling behaviour, or lazy behaviour, as they require.
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Jeremy Howard wrote: The problem with making these builtins respect the number of return values context in want() is that, as Nate mentions, the expressions may have side-effects that are desired for the whole list. An alternative approach is to make these builtins respect lazy(), as defined by RFC 123: This is a worthwhile alternative, I like it. The only problem is that this doesn't address problems like this: my($line) = $FILE; Which will still gobble the whole handle unless you say "lazy()"; Since both lazy() and my proposal are going to talk about lazy behavior, I would say perhaps the best approach is a merging of the two: 1. Assume lazy() where it "can't hurt", as in the above example. 2. Don't assume lazy() where it can (like in grep/map) My proposal was going to say that lazy behavior was optional anyways, so lazy() is really just a different optional way of going about it. That way the programmer could decide whether or not they want the Perl 5 list-gobbling behaviour, or lazy behaviour, as they require. I like this, but I also like the ability for the function to DWIM ala split, without me having to explicitly tell it to. It could make scripts faster without any extra coding. Input? -Nate
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Should be part of the want() context. It is. I interpreted Damian's remark to mean that it would be good if readline() took advantage of it, and that should be RFC'ed. That's indeed precisely what I meant. In fact, all list-returning built-ins ought to be optimized this way. Damian
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Tom Hughes wrote: For example, in Perl you have for a long time been able to do this: ($one, $two) = grep /$pat/, @data; However, what currently happens is grep goes to completion, then discards possibly huge amounts of data just to return the first two matches. For example, if @data was 20,000 elements long, you could potentially save a good chunk of time if you only had to return the first and/or second match, rather than finding 1000 only to throw 998 away. This could fall out of using iterators in the core but without grep itself having to know anything about the left hand side. ... The only problem with this scheme (and indeed I suspect with yours) is if the match expression has a side effect. This is even more of a problem when trying to apply the same optimisation to map because of the widespread use of map in a void context to apply a side effect to the elements. RFC 123 'Builtin: lazy' describes a syntax for explicitly stating that your operation does not have a side effect, and requests that a 'lazy list'/iterator be used. It mentions grep as an example: quote What if adding laziness to a list context was up to the programmer and passed through functions that can support it: for (lazy(grep {$h{$_}-STATE eq 'NY'} keys %h)){ $h{$_}-send_advertisement(); }; would cause a lazy list is passed to Cfor, and increment of the object's "letters_sent_total" field might break the iteration. for (grep {$h{$_}-STATE eq 'NY'} lazy(keys %h)){ $h{$_}-send_advertisement(); }; causes a lazy list to be passed to our filter function Cgrep, saving us from allocating the entire Ckeys array. CGrep is still in the default busy context, so it returns a busy array, which Cfor can iterate over safely. /quote
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Here is my suggestion: What if other functions were able to backtrace context and determine how many arguments to return just like split can? I have an RFC on that: RFC 21: Replace Cwantarray with a generic Cwant function Cwant takes a list of strings that describe aspects of the context in which the current subroutine has been called. It returns a list indicating whether or not the current subroutine's call context matches all the aspects specified in the list ... at least one integer element is returned. That integer (the "expectation count") indicates the number of return values expected by the context. Damian
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Here is my suggestion: What if other functions were able to backtrace context and determine how many arguments to return just like split can? I have an RFC on that: RFC 21: Replace Cwantarray with a generic Cwant function Cwant takes a list of strings that describe aspects of the context in which the current subroutine has been called. It returns a list indicating whether or not the current subroutine's call context matches all the aspects specified in the list ... at least one integer element is returned. That integer (the "expectation count") indicates the number of return values expected by the context. Ever consider then having ($a, $b, $c) = FH; or @a[4,1,5] = FH; only read three lines? I mean, how many if any builtins would it make sense to make aware of this, and do something "different"? Seems a bit rare and unimportant -- until one observes how this would also solve the problem of people being confused by this gobbling up their handle: my($line) = FH; --tom
Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Tom Christiansen wrote: % man perlfunc ... When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. As usual I picked a bad example. And I did read the perlfunc manpage, but somehow both (a) forgot about split's 3rd argument and (b) missed it on the rereading. This was example number 100+ on my list, so I was feeling a little woozy. RTFM. This is documented behavior. I don't understand the hubbub. Yeah, oops. Sorry for wasting bandwidth. sound of self-flogging Let me shift gears and instead ask whether anyone thinks this: $y = ($first, $second) = grep /$pat/, @data; Returning "5" has any value? If you're going to do this, it seems like you'd want the number that were really returned (since scalar grep will give you the total number found anyways). If so, then generalizing split's behavior to return smaller lists when they're requested might make things faster. In particular, grep could potentially stop much sooner if you only wanted the first two matches, and @data was 20,000 elements. If this was extended to user subs, Perl could actually DWIM speed improvements if the sub was building a huge list only to want the first few elements back. The only potential problem I see is that "=()=" would always return 0 now, since it has no elements it's asking for. Hmm. Maybe "=()=" could be special-cased to mean an infinitely hungry list, which is pretty much what it means right now. Anyways, just an idea. -Nate P.S. Consider list() a dead horse for now. No additional flogging required.
Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))
Let me shift gears and instead ask whether anyone thinks this: $y = ($first, $second) = grep /$pat/, @data; Returning "5" has any value? If you're going to do this, it seems like you'd want the number that were really returned (since scalar grep will give you the total number found anyways). Of course: the LHS is a known quantity; only the RHS is a mystery. That's why it does this. I'm sure this in perldata. --tom