Re: RFC 111 (v3) Here Docs Terminators (Was Whitespace and Here Docs)

Glenn Linderman Thu, 14 Sep 2000 14:42:11 -0700
Michael G Schwern wrote:

> On Thu, Sep 14, 2000 at 11:49:18AM -0700, Glenn Linderman wrote:
> > I'm all for solving problems, and this message attempts to specify 3
> > problems, but it needs more specification.  You describe three
> > problems, but it is not clear what the problems are
>
> Since we've been charging back and forth over this ground like a troop
> of doughboys over No Man's Land for the past month, I figured everyone
> knew the problem and proposed solutions.  Your review accuractely lays
> everything out.

OK, I'll try to keep a running list of the problems and solutions... and non-solutions.

> Things like this have come up, and to my eyes and fingers its
> unacceptable.

Well, OK, so now we're talking shades of opinion.  You'd agree it works, though, and 
quite
effectively.  But you'd disagree about its aesthetics, and its performance.  The 
former is
much less interesting to me than the latter.

> Some people like the explicit demarcation of the left
> boundry, I find it ugly and don't like the extra typing.  It doesn't
> win me much over:
>
>     die
>     '    The old lie'.
>     '  Dulce et decorum est'.
>     '      Pro patria mori.';

That's fair, except that they aren't equivalent: you'd need

   die
   '    The old lie'."\n".
   '  Dulce et decorum est'."\n".
   '      Pro patria mori.'."\n";

Which is somewhat worse, compared to the here doc, even with "!" or other leading 
demarcation
of choice (your choice, is, of course, none).

> I'd prefer if here-docs just DWIM.

Yes, but... what do you mean vs. what do others mean, and all these problems....

> So we may want to add Yet Another problem.  I forget what number you
> got up to, but its basically "You shouldn't have to add anything but
> whitespace to the here-doc for indenting".

That's not so much a problem as a restriction on the solution space.  This restriction
requires that indentation be inferred from something, or specified somehow.  Inferring 
it is
problematical.

The only practical inference is via the position of the terminator relative to the 
rest of the
text, clearly the solution you are driving me towards, and I have nothing particularly 
against
that solution, except that it isn't until you find the terminator that you can figure 
out how
much white space should be on each of the lines.  Nothing else could be aligned 
practically
with the rest of text.  Because all existing here docs specify their terminator at the 
left
margin, as long as this is introduced concurrently with allowing leading/trailing 
white space
it could work.

Your example with the desire for some leading white space on each line can't be solved 
with
RFC 162's current solution.

I think that if leading white space is stripped without a visible demarcation sequence 
that it
should only work if the leading white space is identical on each line... and that a 
warning
(and no stripping) should occur if there is an inconsistency in the exact character 
sequence
that would be stripped.  This is, I think, the only way not to open the can of worms 
about the
definition of how big a tab character is.

> An additional problem with dequote() style solutions is they are not
> as efficient.

This is an excellant point... for one use here docs, it is is irrelevant to the 
overall script
performance if the work is done at compile or run time.  But when they are used in 
loops, it
can make a difference... a significant difference.   This is a problem we don't want to
introduce, and dequote-like solutions introduce it.

This could be "solved" by hoisting the here-doc and related dequote processing out of 
the
loop, and into variables, except for the desire to do interpolation of the content.

There seems to be no way to do interpolation of an existing string except eval, which 
would
require constructing syntax around the interpolation, such as

sub interpolate { eval "qq\000" . $_[0] . "\0"; }

and this doesn't work so great if lexicals are mentioned in the parameter... it would 
have to
be done inline to get lexicals to work right.

This leads me down another path: wouldn't it be nice to have a function to interpolate 
a
string on demand?

Then you could hoist the here-doc processing above out of the loop, and still get the 
effects
of interpolation inside the loop, which would make the performance of here-doc 
postprocessing
much less critical... but this means defining variables to hold the intermediate 
results, and
moving the here-doc to a different location, which might not be as friendly to the
understanding of the script.

Another direction to take in this regard would be via RFC 18.  If some of the 
processing for a
sequence of code could be done at compile time and it could rewrite that code to what 
gets
left for runtime, then you wouldn't need to hoist the code out of the loop... instead, 
you
write something like the following.  Clearly your poem doesn't need interpolation, but 
in
general it would be useful.

sub dequote_interpolate : immediate
{
  local $_ = shift;
  my ($leader);  # common white space and common leading string
  if (/^\s*(?:([^\w\s]+).*\n)(?:\s*\1.*\n)+$/) {
    $leader = quotemeta($1);
  } else {
    $leader = '';
  }
  s/^\s*$leader//gm;
  return "interpolate ( $_ );"; # this gets left in place of the call.
    # $_ would get interpolated into what is left, but that could contain other 
references
    # to other variables that would get interpolated later.

# could use the following return, instead, to not depend on the existance of 
interpolate
# return "eval \"qq\\000$_\\0\"";
}

 while ( ... )
{  print OUTFILE dequote_interpolate (<<'POEM');
                !    The old lie
                !  Dulce et decorum est
                !      Pro patria mori.
                POEM
}

I mention these ideas, because they are neat ideas with lots of general applicability, 
even
though probably 90% of the cases would be covered by stripping the amount of white 
space in
front of the here-doc terminator.

--
Glenn
=====
There  are two kinds of people, those
who finish  what they start,  and  so
on...                 -- Robert Byrne



_____NetZero Free Internet Access and Email______
   http://www.netzero.net/download/index.html
Re: RFC 111 (v3) Here Docs Terminators (Was Whitespace and Here Docs)

Reply via email to