Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-04 Thread Peter Scott

At 11:48 AM 9/3/00 +1100, Damian Conway wrote:
 Ever consider then having

 ($a, $b, $c) = FH;
 or
 @a[4,1,5] = FH;

 only read three lines?

I think this is a superb idea, and look forward to someone's RFC'ing it.

I like it too.  Anyone working on the RFC?

I wonder how the p526 translator will handle this.  Suppose someone 
deliberately did

   ($line) = FH;  # Save next line, discard rest

Maybe something like

   { ($line, my @plugh) = FH; }


--
Peter Scott
Pacific Systems Design Technologies




Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-04 Thread Peter Scott

At 10:52 AM 9/4/00 -0600, Nathan Torkington wrote:
Peter Scott writes:
   ($a, $b, $c) = FH;
   or
   @a[4,1,5] = FH;
   only read three lines?
  
  I think this is a superb idea, and look forward to someone's RFC'ing it.

Should be part of the want() context.

It is.  I interpreted Damian's remark to mean that it would be good if 
readline() took advantage of it, and that should be RFC'ed.

  Permit operations to discover
(as does split) how many elements they're being assigned to.

--
Peter Scott
Pacific Systems Design Technologies




Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-04 Thread Nathan Wiger

Tom Christiansen wrote:
 
 Ever consider then having
 
 ($a, $b, $c) = FH;
 or
 @a[4,1,5] = FH;
 
 only read three lines?  I mean, how many if any builtins would it
 make sense to make aware of this, and do something "different"?

Personally, I think this would be really cool; stuff like this is what I
was trying to poke at. Lots more power and flexibility. I could name
lots of builtins this potentially makes sense for:

   ($one, $two) = grep /pat/, @data;
   ($k1, $k2)   = keys %hash;  # leave index at $k3?
   @a[6,5,4]= map { split ' ' } @line;
   ($last)  = reverse @array;

And then there's splice, sort, and any and every user-defined sub too.
The only problem is when grep and map are used to change values on the
fly...this will have to be addressed. But actually, the behavior could
potentially be quite cool - maybe only the number requested back are
changed. Hmmm.

 Seems a bit rare and unimportant -- until one observes how this
 would also solve the problem of people being confused by this
 gobbling up their handle:
 
 my($line) = FH;

And a nice side effect too. As Peter says, the only problem is people
that are relying on this to get the actual last line...but I suspect
that's far fewer people than the ones would just want the first line and
used ()'s on my out of habit.

The more I think about it, this actually might make things more
consistent too. For example, currently these two aren't the same: 

   $count = @a = ($a, $b) = grep /pat/, @data;
   $count = ($a, $b) = @a = grep /pat/, @data;

But these two are:

   $count = ($a, $b) = grep /pat/, @data;
   $count = @a = grep /pat/, @data;

Which, actually, is a little weird when you really think about it. I
wouldn't feel bad about "breaking" (fixing?) this, though, since this:

   $count = grep /pat/, @data;
   $count = grep /pat/, @data;

Is probably how you should be getting the count anyways.

Well, as Damian suggests this should probably be RFC'ed. Since I brought
the whole mess up I'll do it, but I'd really appreciate if people could
send me any input they want to add. It'll likely take me 1-2 weeks,
though, since I have 4 other new ones to write and really have to update
my existing ones. I'll post it to -io when it's finished.

-Nate



Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-04 Thread Jeremy Howard

Nathan Wiger wrote:
 Tom Christiansen wrote:
 
  Ever consider then having
 
  ($a, $b, $c) = FH;
  or
  @a[4,1,5] = FH;
 
  only read three lines?  I mean, how many if any builtins would it
  make sense to make aware of this, and do something "different"?

 Personally, I think this would be really cool; stuff like this is what I
 was trying to poke at. Lots more power and flexibility. I could name
 lots of builtins this potentially makes sense for:

($one, $two) = grep /pat/, @data;
($k1, $k2)   = keys %hash;  # leave index at $k3?
@a[6,5,4]= map { split ' ' } @line;
($last)  = reverse @array;

 And then there's splice, sort, and any and every user-defined sub too.
 The only problem is when grep and map are used to change values on the
 fly...this will have to be addressed. But actually, the behavior could
 potentially be quite cool - maybe only the number requested back are
 changed. Hmmm.

The problem with making these builtins respect the number of return values
context in want() is that, as Nate mentions, the expressions may have
side-effects that are desired for the whole list.

An alternative approach is to make these builtins respect lazy(), as defined
by RFC 123:

quote
What if adding laziness to a list context was up to the programmer
and passed through functions that can support it:

for (lazy(grep {$h{$_}-STATE eq 'NY'} keys %h)){
$h{$_}-send_advertisement();
};


would cause a lazy list is passed to Cfor, and increment of
the object's "letters_sent_total" field might break the iteration.


for (grep {$h{$_}-STATE eq 'NY'} lazy(keys %h)){
$h{$_}-send_advertisement();
};


causes a lazy list to be passed to our filter function Cgrep, saving
us from allocating the entire Ckeys array.  CGrep is still in
the default busy context, so it returns a busy array, which Cfor
can iterate over safely.
/quote

By returning a lazy list, elements that are never used are never calculated.

That way the programmer could decide whether or not they want the Perl 5
list-gobbling behaviour, or lazy behaviour, as they require.





Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-04 Thread Nathan Wiger

Jeremy Howard wrote:
 
 The problem with making these builtins respect the number of return values
 context in want() is that, as Nate mentions, the expressions may have
 side-effects that are desired for the whole list.
 
 An alternative approach is to make these builtins respect lazy(), as defined
 by RFC 123:

This is a worthwhile alternative, I like it. The only problem is that
this doesn't address problems like this:

   my($line) = $FILE;

Which will still gobble the whole handle unless you say "lazy()";

Since both lazy() and my proposal are going to talk about lazy behavior,
I would say perhaps the best approach is a merging of the two:

   1. Assume lazy() where it "can't hurt", as in the above
  example.

   2. Don't assume lazy() where it can (like in grep/map)

My proposal was going to say that lazy behavior was optional anyways, so
lazy() is really just a different optional way of going about it.

 That way the programmer could decide whether or not they want the Perl 5
 list-gobbling behaviour, or lazy behaviour, as they require.

I like this, but I also like the ability for the function to DWIM ala
split, without me having to explicitly tell it to. It could make scripts
faster without any extra coding.

Input?

-Nate



Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-04 Thread Damian Conway

Should be part of the want() context.

It is.  I interpreted Damian's remark to mean that it would be good if 
readline() took advantage of it, and that should be RFC'ed.

That's indeed precisely what I meant. In fact, all list-returning built-ins
ought to be optimized this way.

Damian



Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-02 Thread Jeremy Howard

Tom Hughes wrote:
  For example, in Perl you have for a long time been able to do this:
 
 ($one, $two) = grep /$pat/, @data;
 
  However, what currently happens is grep goes to completion, then
  discards possibly huge amounts of data just to return the first two
  matches. For example, if @data was 20,000 elements long, you could
  potentially save a good chunk of time if you only had to return the
  first and/or second match, rather than finding 1000 only to throw 998
  away.

 This could fall out of using iterators in the core but without
 grep itself having to know anything about the left hand side.

...

 The only problem with this scheme (and indeed I suspect with
 yours) is if the match expression has a side effect. This is
 even more of a problem when trying to apply the same optimisation
 to map because of the widespread use of map in a void context
 to apply a side effect to the elements.

RFC 123 'Builtin: lazy' describes a syntax for explicitly stating that your
operation does not have a side effect, and requests that a 'lazy
list'/iterator be used. It mentions grep as an example:

quote
What if adding laziness to a list context was up to the programmer
and passed through functions that can support it:

for (lazy(grep {$h{$_}-STATE eq 'NY'} keys %h)){
$h{$_}-send_advertisement();
};


would cause a lazy list is passed to Cfor, and increment of
the object's "letters_sent_total" field might break the iteration.


for (grep {$h{$_}-STATE eq 'NY'} lazy(keys %h)){
$h{$_}-send_advertisement();
};


causes a lazy list to be passed to our filter function Cgrep, saving
us from allocating the entire Ckeys array.  CGrep is still in
the default busy context, so it returns a busy array, which Cfor
can iterate over safely.
/quote





Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-02 Thread Damian Conway

Here is my suggestion: What if other functions were able to backtrace
context and determine how many arguments to return just like split can?

I have an RFC on that:

RFC 21: Replace Cwantarray with a generic Cwant function

Cwant takes a list of strings that describe aspects of the
context in which the current subroutine has been called. It
returns a list indicating whether or not the current
subroutine's call context matches all the aspects specified in
the list ... at least one integer element is returned.

That integer (the "expectation count") indicates the number of
return values expected by the context. 

Damian



Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-02 Thread Tom Christiansen

Here is my suggestion: What if other functions were able to backtrace
context and determine how many arguments to return just like split can?

I have an RFC on that:

   RFC 21: Replace Cwantarray with a generic Cwant function

Cwant takes a list of strings that describe aspects of the
context in which the current subroutine has been called. It
returns a list indicating whether or not the current
subroutine's call context matches all the aspects specified in
the list ... at least one integer element is returned.

That integer (the "expectation count") indicates the number of
return values expected by the context. 

Ever consider then having

($a, $b, $c) = FH;
or
@a[4,1,5] = FH;

only read three lines?  I mean, how many if any builtins would it
make sense to make aware of this, and do something "different"?
Seems a bit rare and unimportant -- until one observes how this
would also solve the problem of people being confused by this
gobbling up their handle:

my($line) = FH;

--tom



Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-01 Thread Nathan Wiger

Tom Christiansen wrote:
 
 % man perlfunc
 ...
 When assigning to a list, if LIMIT is omitted, Perl supplies a
 LIMIT one larger than the number of variables in the list, to
 avoid unnecessary work.

As usual I picked a bad example. And I did read the perlfunc manpage,
but somehow both (a) forgot about split's 3rd argument and (b) missed it
on the rereading. This was example number 100+ on my list, so I was
feeling a little woozy. RTFM.
 
 This is documented behavior.  I don't understand the hubbub.

Yeah, oops. Sorry for wasting bandwidth. sound of self-flogging


Let me shift gears and instead ask whether anyone thinks this:

$y = ($first, $second) = grep /$pat/, @data;

Returning "5" has any value? If you're going to do this, it seems like
you'd want the number that were really returned (since scalar grep
will give you the total number found anyways). 

If so, then generalizing split's behavior to return smaller lists when
they're requested might make things faster. In particular, grep could
potentially stop much sooner if you only wanted the first two matches,
and @data was 20,000 elements. If this was extended to user subs, Perl
could actually DWIM speed improvements if the sub was building a huge
list only to want the first few elements back.

The only potential problem I see is that "=()=" would always return 0
now, since it has no elements it's asking for. Hmm. Maybe "=()=" could
be special-cased to mean an infinitely hungry list, which is pretty 
much what it means right now.

Anyways, just an idea.

-Nate

P.S. Consider list() a dead horse for now. No additional flogging
required.



Re: Change ($one, $two)= behavior for optimization? (was Re: RFC 175 (v1) Add Clist keyword to force list context (like Cscalar))

2000-09-01 Thread Tom Christiansen

Let me shift gears and instead ask whether anyone thinks this:

$y = ($first, $second) = grep /$pat/, @data;

Returning "5" has any value? If you're going to do this, it seems like
you'd want the number that were really returned (since scalar grep
will give you the total number found anyways). 

Of course: the LHS is a known quantity; only the RHS is a mystery.
That's why it does this.  I'm sure this in perldata.

--tom