Re: Unclosed quotes on heredoc mode

2021-12-20 Thread Chet Ramey

On 12/20/21 11:02 AM, Chet Ramey wrote:

On 12/9/21 5:30 AM, Robert Elz wrote:

 Date:    Wed, 8 Dec 2021 09:56:50 -0500
 From:    Chet Ramey 
 Message-ID:  

Let's take this in smaller steps, and try and sort out one issue
at at time.


Rack 'em.


I had to go back and look to remember that this was all covered on the
austin group list back in 2016, in which discussion you raised exactly
these two questions.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-12-20 Thread Chet Ramey

On 12/9/21 5:30 AM, Robert Elz wrote:

 Date:Wed, 8 Dec 2021 09:56:50 -0500
 From:Chet Ramey 
 Message-ID:  

Let's take this in smaller steps, and try and sort out one issue
at at time.


Rack 'em.


First, I think you're under a mistaken impression, which is
revealed in the following paragraph.

   | The real question is whether you read a command substitution as a single
   | WORD, so that the lexer cannot return "the next newline token" until the
   | command substitution has been completed.

There is absolutely nothing, anywhere, about "returning" the newline
(token) (token in parens, as while we agree that's what it means, the
standard doesn't currently say that either).


I agree that this is where we disagree. The command substitution is a
single WORD (token), and the parsing performed to find the closing `)'
doesn't have any effect on the parsing state outside it. If you like,
you can change "return" to "encounter."


All that is required is that the lexer encounter a newline (token).
As soon as one is seen, here doc reading commences - which is all a
lexical task.


Yes, this is where the standard needs clarification. I believe that it
implies the command substitution cannot contain text that is interpreted
as a here-document started outside it (the "all characters" text I
referred to previously).



Further, I know bash (and any other shell that works correctly, ignoring
how here docs are processed for this) must encounter the newline token
in its lexer while initially scanning the command substitution (to include
it in whatever word it forms part of).


The same point, restated.


Consider the two following (leading sequences of) command substitutions:

$( echo I need to see the contents of the case $book in order )
and
$( echo I need to see the contents of the
case $book in order )

aside from formatting for this e-mail (added white space, which eventually
becomes irrelevant anyway) there is just a one character change between
the first and the second - a single space char was changed to a newline.

In the first of those, the final ')' shown terminates the command
substitution.   In the second, the ')' doesn't, the command substitution
continues with more not shown here (because it is irrelevant to the
point).





In order properly to collect that command substitution, the lexer that is
collecting it, **MUST** see, recognise, and process, the newline token.


The same point, restated.



Then assuming that immediately before that command substitution
(in each case) we had something like

cat <<'EOF' $( one of the above...

then in the second case, that newline token is the first one seen by the
lexer after the here doc redirection is it not?


Again, the question boils down to whether the contents of a command
substitution affect what's outside it. And again, this is where we disagree.



But the next misconception, or faulty assumption is revealed there.
You're assuming that because, when you look at the page, the here doc in a
case like

cat < is eliminated - for the
purpose of whatever construct was being built when encountered, they
simply do not exist at all.


There's no support in the standard for interpreting "all characters
following the open parenthesis to the matching closing parenthesis" as
including "except for processing any unclosed here documents." At least
backslash-newline processing is mentioned.



Here docs have different rules, but the same effect.


These rules are not stated as such in the standard. (Except for the
"newline token" part we agree needs revising.)



   | I suppose it's precedence parsing: the command substitution has higher
   | precedence than here-documents.

It isn't, because parsing, even pseudo-parsing,


Concentrate on the `precedence' part rather than the `parsing' part. The
lexer has to read the entire command substitution before considering the
here-document.




   | > So, if one does
   | >
   | >   $( cmd 

Re: Unclosed quotes on heredoc mode

2021-12-09 Thread Robert Elz
Date:Wed, 8 Dec 2021 09:56:50 -0500
From:Chet Ramey 
Message-ID:  

Let's take this in smaller steps, and try and sort out one issue
at at time.

First, I think you're under a mistaken impression, which is
revealed in the following paragraph.

  | The real question is whether you read a command substitution as a single
  | WORD, so that the lexer cannot return "the next newline token" until the
  | command substitution has been completed.

There is absolutely nothing, anywhere, about "returning" the newline
(token) (token in parens, as while we agree that's what it means, the
standard doesn't currently say that either).

All that is required is that the lexer encounter a newline (token).
As soon as one is seen, here doc reading commences - which is all a
lexical task.

  [ In some earlier messages, I might have said something about
processing the here doc before returning the newline token, that
was more a comment about how our system works - for us, whatever
the lexer sees, it returns, regardless of what the grammar happens
to be parsing at the time ... that has some issues, and makes other
things much easier, so is something of a tradeoff - but I certainly
never intended to imply that the newline token needed to be returned
before a here doc can be read. That's just our implementation choice. ]

If anything required the newline (token) to be returned to the grammar
(in which case it would obviously have to be a newline token, not just
a newline character, and that whole question would be moot) that would
make here doc positioning a grammar issue, and it definitely is not.

Further, I know bash (and any other shell that works correctly, ignoring
how here docs are processed for this) must encounter the newline token
in its lexer while initially scanning the command substitution (to include
it in whatever word it forms part of).

Consider the two following (leading sequences of) command substitutions:

$( echo I need to see the contents of the case $book in order )
and
$( echo I need to see the contents of the
case $book in order )

aside from formatting for this e-mail (added white space, which eventually
becomes irrelevant anyway) there is just a one character change between
the first and the second - a single space char was changed to a newline.

In the first of those, the final ')' shown terminates the command
substitution.   In the second, the ')' doesn't, the command substitution
continues with more not shown here (because it is irrelevant to the
point).

In order properly to collect that command substitution, the lexer that is
collecting it, **MUST** see, recognise, and process, the newline token.

Then assuming that immediately before that command substitution
(in each case) we had something like

cat <<'EOF' $( one of the above...

then in the second case, that newline token is the first one seen by the
lexer after the here doc redirection is it not?

(In the first case we haven't reached a newline token yet, that can be
expected at some later point).

  | Command substitutions don't appear in the grammar at all, just like here-
  | documents. They're just words, and like other words, the characters they
  | contain don't affect other constructs.

Sure.

But the next misconception, or faulty assumption is revealed there.
You're assuming that because, when you look at the page, the here doc in a
case like

cat < is eliminated - for the
purpose of whatever construct was being built when encountered, they
simply do not exist at all.

That's how

f\
o\
r

still gets to be the reserved word "for" (assuming it appears in the
appropriate place, and that that indentation is just to make the e-mail
easier to read).   Here docs have different rules, but the same effect.

In the case above, the characters in the command substitution are
the contents of this C-style quoted string:

$'\ncommand sub commands here '

(assuming all the white space in this e-mail was actually there in
the input, and isn't, in this case, just e-mail noise - adjust as
appropriate).

  | I suppose it's precedence parsing: the command substitution has higher
  | precedence than here-documents.

It isn't, because parsing, even pseudo-parsing, has nothing to do with
it at all, it all happens in the lower level code which is reading the
input, and scanning it character by character.   All the upper level code
does is to enable here doc processing when a here redirection operator
has been encountered (queuing the here docs to be fetched in the order
they were encountered, in case there is more than one << before a newline
token appears).

  | > So, if one does
  | > 
  | >   $( cmd <

Re: Unclosed quotes on heredoc mode

2021-12-08 Thread Chet Ramey

On 11/28/21 2:29 PM, Robert Elz wrote:



   | So the ultimate question is whether or not the act of reading a command
   | substitution should reset this requirement. That's where we disagree.
   | The grammar is, at that point, reading a different command.

"command" is a loaded word in sh terminology, it is used for all kinds of
things, but in general it is not at all unusual for here document text to
appear while a command other than the one with the redirection operator is
being processed (no command substitutions necessarily involved).   What the
grammar is doing after a here doc redirection operator has been processed,
until the next newline (token) is encountered is irrelevant - the spec
imposes no requirements upon that at all.


We agree on this.

The real question is whether you read a command substitution as a single
WORD, so that the lexer cannot return "the next newline token" until the
command substitution has been completed.

Command substitutions don't appear in the grammar at all, just like here-
documents. They're just words, and like other words, the characters they
contain don't affect other constructs.

I suppose it's precedence parsing: the command substitution has higher
precedence than here-documents.


Sure, but that's not what I meant.   I treat heredoc data as much the same
as a \newline - something that the lexer deals with, and the grammar never
knows happened.   Heredoc data doesn't appear at all in the sh grammar,
as nothing in the grammar cares in the slightest about them (once they're
queued).  What I meant was that from that perspective, whether a sh script
(or sh script fragment) is valid or not, is determined by the grammar, and
given that here doc data does not appear there, it cannot have any impact
upon the decision whether some particular part of the sh input is valid or
not.  


Here-documents are simply quoted strings with some peculiar properties,
read a line at a time.


So, if one does

$( cmd | The netbsd shell appears to be the outlier here. The parser reads 
the
   | >| command substitution so it can parse the entire and-or list before 
trying
   | >| to gather any here-documents.
   | >
   | > You cannot possibly really mean that I hope.   That is, in
   | >
   | >   cmd1 <   data
   | >   EOF
   | >   cmd2
   | >
   | > you do agree that "data" is stdin to cmd1, that is, the herdoc data
   | > appears splat in the middle of the and-or list.   That's certainly the
   | > way it appears to work (in bash) to me.
   |
   | There is no command substitution in this example.

I know.   But go back and read the quote from you (still here, above, in
this message) again: "The parser reads the command substitution so it can
parse the entire and-or list before trying to gather any here-documents"


The command substitution is a single word. There isn't any newline token
returned to the grammar until it's complete, and there isn't any reason
to read the here-document until it is. That's what this all comes down to:
"all characters following the open parenthesis to the matching closing
parenthesis constitute the command."



** parse the entire and-or list before trying to gather any here documents **

I don't believe that you really meant that, it isn't the way bash behaves
(unless this is something different in the devel version, but I doubt that)
and I was just pointing out that poor phraseology.


Ok.



   | So, again, the question is whether or not input data that is logically
   | part of the command substitution (it appears between the opening and
   | closing parentheses) should affect the `outer' command. That's the
   | question. We have different answers.

We do, because I don't view here doc data as affecting anything except the
command for which it is input. 


OK, then we can stop here. We're not going to agree on this.



But one can also do

printf "%s\n" 'data' >/tmp/hidden.data.$$
 $( cmd 

Oh, stop. The two constructs might have the same functional effect, but
explicitly referring to an existing file within the command substitution
doesn't have anything to do with parsing or lexical analysis.




And then once you allow that to work (which you're apparently now doing
in the devel 

Re: Unclosed quotes on heredoc mode

2021-11-28 Thread Alex fxmbsw7 Ratchev
yea im sorry, .., .. code happily on plz

On Sun, Nov 28, 2021, 21:25 Robert Elz  wrote:

> Date:Sun, 28 Nov 2021 20:51:33 +0100
> From:Alex fxmbsw7 Ratchev 
> Message-ID:   nachji6-r...@mail.gmail.com>
>
>   | a small comment on that /bin in PATH code.. is invalid, you need to
> match
>   | first non : beginning ahe not : ending end
>   | case :$PATH: would fix it
>
> If it was the slightest bit relevant whether that part of the example
> worked or not, you'd be right - but use of that example was purely
> because it doesn't depend upon anything else that needs to be set up,
> it could just as easily have been
>
> case $HOME in /home/*) echo HOME is at home;; esac
>
> for all the use it is.   The relevant part was the here doc data location.
>
> kre
>
>
>


Re: Unclosed quotes on heredoc mode

2021-11-28 Thread Robert Elz
Date:Sun, 28 Nov 2021 20:51:33 +0100
From:Alex fxmbsw7 Ratchev 
Message-ID:  


  | a small comment on that /bin in PATH code.. is invalid, you need to match
  | first non : beginning ahe not : ending end
  | case :$PATH: would fix it

If it was the slightest bit relevant whether that part of the example
worked or not, you'd be right - but use of that example was purely
because it doesn't depend upon anything else that needs to be set up,
it could just as easily have been

case $HOME in /home/*) echo HOME is at home;; esac

for all the use it is.   The relevant part was the here doc data location.

kre




Re: Unclosed quotes on heredoc mode

2021-11-28 Thread Alex fxmbsw7 Ratchev
a small comment on that /bin in PATH code.. is invalid, you need to match
first non : beginning ahe not : ending end
case :$PATH: would fix it

On Sun, Nov 28, 2021, 20:31 Robert Elz  wrote:

> Date:Sat, 27 Nov 2021 13:57:57 -0500
> From:Chet Ramey 
> Message-ID:  <5217c48e-c989-a163-5673-38995e35a...@case.edu>
>
> Warning: long message follows, give yourself time to digest it.
>
>   | OK, if you do end up building the devel branch, I'd be interested
>   | in these results.
>
> Assuming that happens, I shall certainly let you know.
>
>   | > Once, of course ... why would I ever build it again?
>   |
>   | Patches exist. There are vendors who take the original release, apply
> their
>   | own special-sauce patches, then apply the patches I release as they
> come
>   | out, as part of their own distribution release process.
>
> Of course, NetBSD pkgsrc (used on other systems as well) does that too.
> But your patches appear about every 5-6 months, so I end up doing one
> build every 5-6 months.   Keeping the object files (even the unpacked
> sources) sitting around waiting for the next patches, in order to save
> perhaps 2-3 minutes of build time isn't worth the bother.   Once built
> and installed it all gets trashed.
> [I have also contemplated doing builds in an MFS (or tmpfs)
> which would vanish on a reboot (or just umount) and I do tend
> to reboot more often than bash patches are released ... but I've
> yet to actually do that, for bash, the build time saved
>  wouldn't
> be worth the bother - for some other apps, it might be].
>
> pkgsrc doesn't encourage attempting to retain anything in any case - it
> probably isn't a problem for bash (at least I've never see it, not that
> I ever looked either) but other applications have a habit of deleting files
> from their distributions - and unless one starts from an empty directory,
> unpacking a tarball doesn't cause those files to be removed ... further,
> some build systems don't pay attention to what is supposed to be there,
> and manage to link all the .o files they can find.
>
> It is easier, and more reliable, to simply start clean every time.
>
> But of course that doesn't apply when you're developing and building
> several times a day (or sometimes, dozens of times an hour).   That just
> doesn't apply to me with bash.
>
>   | Usually, that's ok. In this instance, where we're discussing a feature
>   | whose implementation is substantially different between the released
> and
>   | development versions, it's more relevant.
>
> Sure, though I didn't know this part was changed so much in the
> devel version until you told me just recently (I do not watch what happens
> there).
>
>   | So the ultimate question is whether or not the act of reading a command
>   | substitution should reset this requirement. That's where we disagree.
>   | The grammar is, at that point, reading a different command.
>
> "command" is a loaded word in sh terminology, it is used for all kinds of
> things, but in general it is not at all unusual for here document text to
> appear while a command other than the one with the redirection operator is
> being processed (no command substitutions necessarily involved).   What the
> grammar is doing after a here doc redirection operator has been processed,
> until the next newline (token) is encountered is irrelevant - the spec
> imposes no requirements upon that at all.
>
>
>   | > Then we get to whether heredoc data is part of a valid shell script
>   | > in that sense - when there is yet to be a newline token to introduce
> it.
>   |
>   | What does this mean? In all cases, the here-documents are not read
> until
>   | after a newline token. That's not the issue.
>
> Sure, but that's not what I meant.   I treat heredoc data as much the same
> as a \newline - something that the lexer deals with, and the grammar never
> knows happened.   Heredoc data doesn't appear at all in the sh grammar,
> as nothing in the grammar cares in the slightest about them (once they're
> queued).  What I meant was that from that perspective, whether a sh script
> (or sh script fragment) is valid or not, is determined by the grammar, and
> given that here doc data does not appear there, it cannot have any impact
> upon the decision whether some particular part of the sh input is valid or
> not.   Of course, if the script ends (completely) without a newline token
> after the last redirect operator then that's an error - but of a subtly
> different kind (more like an unterminated string (mismatched quotes) or
> here doc data without its required terminating word -- all lexical
> constructs).
>
> So, if one does
>
> $( cmd <
> there's nothing invalid about that, unless EOF follows that ')' before
> a newline token appears.   And if that happens, it isn't the grammar that
> complains, but something beyond that.   The syntax "word redirect" is
> perfectly valid, and "<< word" is a 

Re: Unclosed quotes on heredoc mode

2021-11-28 Thread Robert Elz
Date:Sat, 27 Nov 2021 13:57:57 -0500
From:Chet Ramey 
Message-ID:  <5217c48e-c989-a163-5673-38995e35a...@case.edu>

Warning: long message follows, give yourself time to digest it.

  | OK, if you do end up building the devel branch, I'd be interested
  | in these results.

Assuming that happens, I shall certainly let you know.

  | > Once, of course ... why would I ever build it again?
  |
  | Patches exist. There are vendors who take the original release, apply their
  | own special-sauce patches, then apply the patches I release as they come
  | out, as part of their own distribution release process.

Of course, NetBSD pkgsrc (used on other systems as well) does that too.
But your patches appear about every 5-6 months, so I end up doing one
build every 5-6 months.   Keeping the object files (even the unpacked
sources) sitting around waiting for the next patches, in order to save
perhaps 2-3 minutes of build time isn't worth the bother.   Once built
and installed it all gets trashed.
[I have also contemplated doing builds in an MFS (or tmpfs)
which would vanish on a reboot (or just umount) and I do tend
to reboot more often than bash patches are released ... but I've
yet to actually do that, for bash, the build time saved wouldn't
be worth the bother - for some other apps, it might be].

pkgsrc doesn't encourage attempting to retain anything in any case - it
probably isn't a problem for bash (at least I've never see it, not that
I ever looked either) but other applications have a habit of deleting files
from their distributions - and unless one starts from an empty directory,
unpacking a tarball doesn't cause those files to be removed ... further,
some build systems don't pay attention to what is supposed to be there,
and manage to link all the .o files they can find.

It is easier, and more reliable, to simply start clean every time.

But of course that doesn't apply when you're developing and building
several times a day (or sometimes, dozens of times an hour).   That just
doesn't apply to me with bash.

  | Usually, that's ok. In this instance, where we're discussing a feature
  | whose implementation is substantially different between the released and
  | development versions, it's more relevant.

Sure, though I didn't know this part was changed so much in the
devel version until you told me just recently (I do not watch what happens
there).

  | So the ultimate question is whether or not the act of reading a command
  | substitution should reset this requirement. That's where we disagree.
  | The grammar is, at that point, reading a different command.

"command" is a loaded word in sh terminology, it is used for all kinds of
things, but in general it is not at all unusual for here document text to
appear while a command other than the one with the redirection operator is
being processed (no command substitutions necessarily involved).   What the
grammar is doing after a here doc redirection operator has been processed,
until the next newline (token) is encountered is irrelevant - the spec
imposes no requirements upon that at all.


  | > Then we get to whether heredoc data is part of a valid shell script
  | > in that sense - when there is yet to be a newline token to introduce it.
  |
  | What does this mean? In all cases, the here-documents are not read until
  | after a newline token. That's not the issue.

Sure, but that's not what I meant.   I treat heredoc data as much the same
as a \newline - something that the lexer deals with, and the grammar never
knows happened.   Heredoc data doesn't appear at all in the sh grammar,
as nothing in the grammar cares in the slightest about them (once they're
queued).  What I meant was that from that perspective, whether a sh script
(or sh script fragment) is valid or not, is determined by the grammar, and
given that here doc data does not appear there, it cannot have any impact
upon the decision whether some particular part of the sh input is valid or
not.   Of course, if the script ends (completely) without a newline token
after the last redirect operator then that's an error - but of a subtly
different kind (more like an unterminated string (mismatched quotes) or
here doc data without its required terminating word -- all lexical constructs).

So, if one does

$( cmd <| The netbsd shell appears to be the outlier here. The parser reads the
  | >| command substitution so it can parse the entire and-or list before 
trying
  | >| to gather any here-documents.
  | > 
  | > You cannot possibly really mean that I hope.   That is, in
  | > 
  | >   cmd1 <   data
  | >   EOF
  | >   cmd2
  | > 
  | > you do agree that "data" is stdin to cmd1, that is, the herdoc data
  | > appears splat in the middle of the and-or list.   That's certainly the
  | > way it appears to work (in bash) to me.
  |
  | There is no command substitution in this example.

I know.   But go 

Re: Unclosed quotes on heredoc mode

2021-11-27 Thread Chet Ramey

On 11/24/21 10:40 AM, Robert Elz wrote:

 Date:Tue, 23 Nov 2021 11:09:51 -0500
 From:Chet Ramey 
 Message-ID:  <3a5f6f3a-aa73-d8ac-46f4-46467d5b3...@case.edu>

   | > I'll run our tests against the newest (released) bash
   |
   | OK. However, since, as I said, the devel branch has a completely different
   | implementation, this is not particularly useful.

OK, then I won't bother ... running the tests is easy (about 5 mins
elapsed time, after about 1 second setup) but analysing the results to
separate out the real bugs from the places where bash is just different
from our shell, and neither is right or wrong (our tests are testing to
make sure we don't change things by accident) takes longer.


OK, if you do end up building the devel branch, I'd be interested in these
results.



   | It's the build version: how many times have you built in this build tree?

Once, of course ... why would I ever build it again?


Patches exist. There are vendors who take the original release, apply their
own special-sauce patches, then apply the patches I release as they come
out, as part of their own distribution release process.



   | Whatever. You do you. Don't be surprised if many of my answers turn out to
   | be "that's already fixed in the devel branch."

First, thanks to those (several) people who indicated how I could
fetch the devel code, I might look at that sometime, but in general I
prefer to wait for the released versions (and then for those to get
included in NetBSD's pkgsrc, which usually happens quite quickly).


Usually, that's ok. In this instance, where we're discussing a feature
whose implementation is substantially different between the released and
development versions, it's more relevant.



   | Refer to my previous message about the reading-full-lines strategy.

I have no problem with reading full lines, but whenever a "full line"
includes a newline token, any pending here docs should be read. 


So the ultimate question is whether or not the act of reading a command
substitution should reset this requirement. That's where we disagree.
The grammar is, at that point, reading a different command.



   | The devel branch produces
   |
   | TRACE: pid 78934: parse_comsub: need_here_doc = 1 after yyparse()?
   | cat: abc: No such file or directory
   | cat: def: No such file or directory

That looks much better.


Like I said, it's a conscious choice that is still fluid.


   | We talked about this. The command substitution starts a new parsing context
   | to implement the "any valid shell script" part of the standard.

Then we get to whether heredoc data is part of a valid shell script
in that sense - when there is yet to be a newline token to introduce it.


What does this mean? In all cases, the here-documents are not read until
after a newline token. That's not the issue.



This is where we started this, the question of which newline is the one
after which heredoc data starts.   It isn't at all as clear as you make
it appear to be.


Apparently not.



   | The netbsd shell appears to be the outlier here. The parser reads the
   | command substitution so it can parse the entire and-or list before trying
   | to gather any here-documents.

You cannot possibly really mean that I hope.   That is, in

cmd1 <

There is no command substitution in this example.


Once again, heredoc gathering has nothing at all to do with the grammar,
and so obviously not the parser either, beyond it informing the lexer which
heredocs are pending.


So, again, the question is whether or not input data that is logically
part of the command substitution (it appears between the opening and
closing parentheses) should affect the `outer' command. That's the
question. We have different answers.





   | The fundamental point of disagreement is what to do if the lexer (after,
   | presumably, calling the parser recursively) finds that it still has here-
   | documents to read after reading the end of the command substitution.


It was. Now we have moved away from that and added the `should text in the
command substitution satisfy here-documents outside it?'

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-24 Thread Robert Elz
Date:Tue, 23 Nov 2021 11:09:51 -0500
From:Chet Ramey 
Message-ID:  <3a5f6f3a-aa73-d8ac-46f4-46467d5b3...@case.edu>

  | > I'll run our tests against the newest (released) bash
  |
  | OK. However, since, as I said, the devel branch has a completely different
  | implementation, this is not particularly useful.

OK, then I won't bother ... running the tests is easy (about 5 mins
elapsed time, after about 1 second setup) but analysing the results to
separate out the real bugs from the places where bash is just different
from our shell, and neither is right or wrong (our tests are testing to
make sure we don't change things by accident) takes longer.

  | It's the build version: how many times have you built in this build tree?

Once, of course ... why would I ever build it again?

  | I get into the hundreds before I recycle it.

Of course, when you're doing development that tends to happen, it does
for me as well - but I don't do bash development, just use it (interactively
only, because I was csh trained, and my fingers still type !! and !$ all
the time).

  | Whatever. You do you. Don't be surprised if many of my answers turn out to
  | be "that's already fixed in the devel branch."

First, thanks to those (several) people who indicated how I could
fetch the devel code, I might look at that sometime, but in general I
prefer to wait for the released versions (and then for those to get
included in NetBSD's pkgsrc, which usually happens quite quickly).

Wasting effort (mine and yours) isn't a goal, so I can either try
(probably just once) a devel version via a tarball, or wait for the
devel version to turn into a release, and run the tests then.

  | Refer to my previous message about the reading-full-lines strategy.

I have no problem with reading full lines, but whenever a "full line"
includes a newline token, any pending here docs should be read.  As soon
as you see the << the grammar should be telling the lexer to look for
heredoc data as soon as (probably actually just prior to) returning a
newline token to the grammer (the before/after doesn't really matter,
doing it before just saves needing to remember that the newline was read
just before .. in either case the newline char has been read and consumed,
turned into a newline token, which will be returned to the grammar, and the
next grammar related input will be after the heredoc data is done, so reading
the heredoc(s) and then returning the newline token is slightly simpler).


  | The devel branch produces
  |
  | TRACE: pid 78934: parse_comsub: need_here_doc = 1 after yyparse()?
  | cat: abc: No such file or directory
  | cat: def: No such file or directory

That looks much better.

  | We talked about this. The command substitution starts a new parsing context
  | to implement the "any valid shell script" part of the standard.

Then we get to whether heredoc data is part of a valid shell script
in that sense - when there is yet to be a newline token to introduce it.

This is where we started this, the question of which newline is the one
after which heredoc data starts.   It isn't at all as clear as you make
it appear to be.

  | The netbsd shell appears to be the outlier here. The parser reads the
  | command substitution so it can parse the entire and-or list before trying
  | to gather any here-documents.

You cannot possibly really mean that I hope.   That is, in

cmd1 < And then there is of course the combination of the two of those examples:
  | > 
  | > cat <

Re: Unclosed quotes on heredoc mode

2021-11-23 Thread Lawrence Velázquez
On Tue, Nov 23, 2021, at 10:35 PM, Martijn Dekker wrote:
> Op 20-11-21 om 23:54 schreef Robert Elz:
>> What the devel one does is unknown to me, I don't think I even have
>> the means to obtain it (I have nothing at all git related, and no interest
>> in changing that state of affairs).
>
> Github allows downloading a gzipped tarball of any branch's current 
> state via https://github.com///tarball/
>
> There's a regularly updated mirror of the bash repo here:
> https://github.com/bminor/bash/
>
> So, the URL to the .tgz for the current bash devel branch is:
> https://github.com/bminor/bash/tarball/devel

The official repository also provides a snapshot at
.

-- 
vq



Re: Unclosed quotes on heredoc mode

2021-11-23 Thread David
On Wed, 24 Nov 2021 at 14:36, Martijn Dekker  wrote:

> There's a regularly updated mirror of the bash repo here:
> https://github.com/bminor/bash/

Or if you care about software freedom you might prefer:
  https://git.savannah.gnu.org/cgit/bash.git



Re: Unclosed quotes on heredoc mode

2021-11-23 Thread Martijn Dekker

Op 20-11-21 om 23:54 schreef Robert Elz:

What the devel one does is unknown to me, I don't think I even have
the means to obtain it (I have nothing at all git related, and no interest
in changing that state of affairs).


Github allows downloading a gzipped tarball of any branch's current 
state via https://github.com///tarball/


There's a regularly updated mirror of the bash repo here:
https://github.com/bminor/bash/

So, the URL to the .tgz for the current bash devel branch is:
https://github.com/bminor/bash/tarball/devel

...and, since it came up in this thread, this one is for the current 
ksh93 development code (not a mirror):

https://github.com/ksh93/ksh/tarball/master

--
||  modernish -- harness the shell
||  https://github.com/modernish/modernish
||
||  KornShell lives!
||  https://github.com/ksh93/ksh



Re: Unclosed quotes on heredoc mode

2021-11-23 Thread Alex fxmbsw7 Ratchev
in stacked up heredocs on one line, one has just to think programmical
serial... the bash is tilleof parser data gather mode

{ printf %s\\n "$( 

Re: Unclosed quotes on heredoc mode

2021-11-23 Thread Chet Ramey

On 11/20/21 5:54 PM, Robert Elz wrote:

 Date:Sat, 20 Nov 2021 15:19:33 -0500



   | How about this. You show me examples where bash (devel bash) does what you
   | think is the wrong thing, and we agree it's a bug, I'll fix it.

I'll run our tests against the newest (released) bash (5.1.12(1)-release)


OK. However, since, as I said, the devel branch has a completely different
implementation, this is not particularly useful.


[what does the (1) represent??   It always seems to be (1) in versions I see.]


It's the build version: how many times have you built in this build tree?
I get into the hundreds before I recycle it.



   | The devel bash already does this.

What the devel one does is unknown to me, I don't think I even have
the means to obtain it (I have nothing at all git related, and no interest
in changing that state of affairs).


Whatever. You do you. Don't be surprised if many of my answers turn out to
be "that's already fixed in the devel branch."

It just seems like a tremendous amount of wasted effort to point out things
that have already been changed.



What I meant was this one:

cat  and a here doc operator in a command substitution might not encounter
   | > a newline until after the cmdsub text has ended - the next following 
newline
   | > token provides there here doc text.
   |
   | I can't imagine a useful example of this that isn't an error.

That's the 2nd example above, and a very normal thing to want to do, very
short command substitutions (most of them) prefer to be complete within 1 line.


If you want the text of the here-document to apply to the command
substitution, put it inside the command substitution. Otherwise, you
violate the "any valid shell script" clause and the behavior varies there.



Note that neither in POSIX, nor anywhere else, has there ever been any
requirement on the heredoc data other than that it comes after the next
newline (which should, we agree, be newline token, not newline character).


OK.


Since heredocs are a lexical object, this processing is totally unaffected
by whatever semantics the grammar is extracting from the tokens the lexer is
returning to it, the grammar just increments the "number of heredocs needed"
counter, supplies the end words for 

Re: Unclosed quotes on heredoc mode

2021-11-20 Thread Robert Elz
Date:Sat, 20 Nov 2021 15:19:33 -0500
From:Chet Ramey 
Message-ID:  

  | Right. Purposeful.

There's a difference between done intentionally for pragmatic reasons,
and done intentionally because it is the right thing to do and people
should depend upon it remaining that way.

  | How about this. You show me examples where bash (devel bash) does what you
  | think is the wrong thing, and we agree it's a bug, I'll fix it.

I'll run our tests against the newest (released) bash (5.1.12(1)-release)
[what does the (1) represent??   It always seems to be (1) in versions I see.]

  | The devel bash already does this.

What the devel one does is unknown to me, I don't think I even have
the means to obtain it (I have nothing at all git related, and no interest
in changing that state of affairs).

  | > and a newline token in the middle of
  | > a command substitution counts for a here doc operator that occurred before
  | > it, 
  |
  | What does `counts' mean? You're not really reading the lines as shell
  | words,

"counts" means "is the one that matters"  (ie: do not ignore this one).

But, no, not this...

  | cat << EOF
  | echo $(echo this EOF is
  | not the end of
  | the command substitution
  | EOF
  | but it is the end of the
  | here-document
  | )

though that is a mildly interesting case, and I agree on how that
gets parsed (the contents of the here doc are not examined until it
is expanded when used for a redirection).   That should result in a
redirection error for cat, then (probably) "but: not found" (if the
shell didn't already exit), "here-document: not found" and a syntax
error on the ')'.  (The "not found" errors are, naturally, assuming
that commands of those names aren't found in a PATH search).

What I meant was this one:

cat From the line numbers, I assume the first is when scanning the outer cat
command, and detecting its cmdsub arg, and the 2nd is from rescanning the
command substitution.   The first one clearly knows there is a heredoc,
it also knows it is yet to encounter a newline token (or any newline in
this example) hence the heredoc data cannot possibly be expected yet, it
must wait until after that newline - eventually it gets past >/dev/null,
finds the newline (token), and should start reading the heredoc text.
At that point it looks to see where the << redirection occurred (the first on
the line since this is the first heredoc read) and associates the data
with that redirection operator.   When the cmdsub is ready to be executed
it finds the heredoc data already read and available.

I never got to enter the lines starting "abc" ... (I could have, but I know
I would have just seen 3 command not found errors, one for each line, so I
didn't bother.)

In both of those, the first newline token following the << operator (and its
word) is the one at the end of the first line (of each).  The heredoc data
for each therefore starts on the 2nd line.

What should happen:

[jinx]{3}$ cat <  foobar
> EOF
> echo barfoo) *.c
 foobar
[jinx]{3}$ cat $( cat  abc
> def
> FILES
cat: abc: No such file or directory
cat: def: No such file or directory

For the first there are a couple of .c files in $PWD but they don't contain 
"barfoo", Neither "abc" nor "def" exist in $PWD


  | > and a here doc operator in a command substitution might not encounter
  | > a newline until after the cmdsub text has ended - the next following 
newline
  | > token provides there here doc text.
  |
  | I can't imagine a useful example of this that isn't an error.

That's the 2nd example above, and a very normal thing to want to do, very
short command substitutions (most of them) prefer to be complete within 1 line.

Note that neither in POSIX, nor anywhere else, has there ever been any
requirement on the heredoc data other than that it comes after the next
newline (which should, we agree, be newline token, not newline character).
Since heredocs are a lexical object, this processing is totally unaffected
by whatever semantics the grammar is extracting from the tokens the lexer is
returning to it, the grammar just increments the "number of heredocs needed"
counter, supplies the end words for each, and the lexer takes care of the rest.

And then there is of course the combination of the two of those examples:

cat  redirect is moved before the cmdsub).





Re: Unclosed quotes on heredoc mode

2021-11-20 Thread Chet Ramey

On 11/20/21 12:35 PM, Robert Elz wrote:

 Date:Sat, 20 Nov 2021 11:33:37 -0500
 From:Chet Ramey 
 Message-ID:  <4addb789-50b6-12a5-7b8a-8a082abaa...@case.edu>

   | I'm skeptical, but willing to be convinced. Bourne's shell allowed EOF to
   | terminate all sorts of things (quoted strings, command substitutions, here
   | documents) -- enough to make it purposeful.

More likely economical.   Making things fit in that sh was a real challenge.


Right. Purposeful.


That's a good starting point, provided you're willing to actually implement
that.  That's what I'd like. 


How about this. You show me examples where bash (devel bash) does what you
think is the wrong thing, and we agree it's a bug, I'll fix it.


 But for this you need to understand that
the shell has to parse and understand command substitutions, as they're read,
in order to correctly find the end,


The devel bash already does this. We've talked about it before. You need to
use bison, not byacc, and a new enough version of bison, but it works fine.


and a newline token in the middle of
a command substitution counts for a here doc operator that occurred before
it, 


What does `counts' mean? You're not really reading the lines as shell
words, so a command substitution isn't really a command substitution while
you're reading the body of a here-document. You mean something like this?

cat << EOF
echo $(echo this EOF is
not the end of
the command substitution
EOF
but it is the end of the
here-document
)



and a here doc operator in a command substitution might not encounter
a newline until after the cmdsub text has ended - the next following newline
token provides there here doc text.


I can't imagine a useful example of this that isn't an error.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-20 Thread Robert Elz
Date:Sat, 20 Nov 2021 11:33:37 -0500
From:Chet Ramey 
Message-ID:  <4addb789-50b6-12a5-7b8a-8a082abaa...@case.edu>

  | I'm skeptical, but willing to be convinced. Bourne's shell allowed EOF to
  | terminate all sorts of things (quoted strings, command substitutions, here
  | documents) -- enough to make it purposeful.

More likely economical.   Making things fit in that sh was a real challenge.

  | > So just how many complaints do you get about the warning message?
  | > "ksh doesn't complain wbout this, why does bash?"

  | It's usually people who have misplaced or mistyped the ending delimiter.

Yes, exactly - it happens like that all the time.   That's why it should
be an error, not just a warning - no different than when someone does
cmd >/tpm/foo
they made a typo, they get an error, and fix it.   No problems.


  | "When an io_here token has been recognized by the grammar (see Shell
  | Grammar), one or more of the subsequent lines immediately following the
  | next NEWLINE token form the body of one or more here-documents and shall be
  | parsed according to the rules of Here-Document."

That's a good starting point, provided you're willing to actually implement
that.  That's what I'd like.   But for this you need to understand that
the shell has to parse and understand command substitutions, as they're read,
in order to correctly find the end, and a newline token in the middle of
a command substitution counts for a here doc operator that occurred before
it, and a here doc operator in a command substitution might not encounter
a newline until after the cmdsub text has ended - the next following newline
token provides there here doc text.

  | That implies that the shell goes off and
  | reads lines before parsing the rest of the current line as a list.

Yes, certainly - how to read the here doc seems to be agreed, just when
to read it is not.

  | >   cat < file ; echo "abc
  | >   def
  | >   EOF
  | >   ghi" \
  | >   EOF
  | >   EOF
  | > What is the here doc, and what does echo say.
  |
  | That's a good example. The here-doc is empty (the delimiter is the third
  | EOF) and the echo prints the rest of the text, with the backslash-newline
  | disappearing.

I agree.

  | I'd say that this is somewhat deceptive, and is a decent illustration of my
  | point. The shell -- bash, at least -- always reads complete lines from the
  | input before parsing any here documents, so it's going to keep reading
  | through the second EOF to read the `complete' first line, due to the quoted
  | string and the quoted newline. The `current' token is going to be the
  | newline that follows the second EOF even before it starts figuring out that
  | it has a here-document and goes off to collect the body.

That's reasonable.   As long as you stick to reading lines, and parsing
them as they're read, and then insert here doc contents as soon as a here
doc operator is located on one of the lines read.

  | > For this.  No.   An extension.  One that comes for feee.
  |
  | I like the Freudian slip there.

Oops...   didn't spot that one!

kre




Re: Unclosed quotes on heredoc mode

2021-11-20 Thread Chet Ramey

On 11/19/21 9:18 AM, Robert Elz wrote:


illusory compat issues.  I have no idea what inspired this initially, but
my guess would be a code bug no-one noticed.


I'm skeptical, but willing to be convinced. Bourne's shell allowed EOF to
terminate all sorts of things (quoted strings, command substitutions, here
documents) -- enough to make it purposeful.



So just how many complaints do you get about the warning message?
"ksh doesn't complain wbout this, why does bash?"


It's usually people who have misplaced or mistyped the ending delimiter.
It took only a few seconds to find this:

https://unix.stackexchange.com/questions/657488/warning-here-document-at-line-2-delimited-by-end-of-file-wanted-eof

I don't have time right now to look for other reports that might have
tested it against other shells.



   | Which instance of `ola"'? The first or the second?

The first.

   | This cannot be a serious question unless you mean the second.

It is a very serious question, but not as to what should hppen
but how the standard needs to describe it.


That's why I suggested what I did.

Some variant of the existing

"When an io_here token has been recognized by the grammar (see Shell
Grammar), one or more of the subsequent lines immediately following the
next NEWLINE token form the body of one or more here-documents and shall be
parsed according to the rules of Here-Document."

could probably work as a basis. That implies that the shell goes off and
reads lines before parsing the rest of the current line as a list.



   | The delimiter is a `word', and we both know what a shell word is.

yes, but that's irrelevant, it is merely a coincidence here that
the newline in question occurs in the delimiter.
Another example
cat < file ; echo "abc
def
EOF
ghi" \
EOF
EOF
What is the here doc, and what does echo say.


That's a good example. The here-doc is empty (the delimiter is the third
EOF) and the echo prints the rest of the text, with the backslash-newline
disappearing.

I'd say that this is somewhat deceptive, and is a decent illustration of my
point. The shell -- bash, at least -- always reads complete lines from the
input before parsing any here documents, so it's going to keep reading
through the second EOF to read the `complete' first line, due to the quoted
string and the quoted newline. The `current' token is going to be the
newline that follows the second EOF even before it starts figuring out that
it has a here-document and goes off to collect the body.

So, the shell reads the here-document body and creates the here document
after it reads an unquoted newline token -- the first newline token after
finding the here-document delimiter.



The first newline after the << is the one after abc.
Do remember that here doc data collection is entirely a
lexical issue, that's why tgey dot appear anywhere in
the sh grammar.


Oh, I do.



   | The newline after the delimiter is both, but sure, newline token would
   | probably work better.

The example above shows the issue better.  That includes the \newline
which can only be a \ newline because the 2nd char there is a newline,
and that has to be seen at the lexical level.


Yes. Here-documents are one of those features that requires mutual feedback
between the parser and lexer.



   | So it doesn't read `lines' in the POSIX sense? Huh. Who knew?

For this.  No.   An extension.  One that comes for feee.


I like the Freudian slip there.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-19 Thread Robert Elz
Date:Thu, 18 Nov 2021 15:46:10 -0500
From:Chet Ramey 
Message-ID:  <5c36d290-0e6e-2aa0-f388-20ec9369a...@case.edu>

  | Yeah, that's a bug. But it's probably baked in.

Very.  Just stopping parsing expansions while reading the here doc
delim string would be easy (well, possible anyway), but the expansion
syntax needs to be handled still, ${x- } is still one word, not two
which it would be if $ processing was simply disabled.   Further
in cmd <<$( random text 'xyz' )   the end delim most likely
includes the literal ' chars.   That would be very difficult to
deal with.  Similarly if a cmdsub included a $' string, that
should not be expanded...


  | OK, that's clearly a bug. Is this specific to the literal string `$PATH',
  | or are there more things that trigger it?

Any $ expansion strings.  Or ``.

The here doc end delim is the only place these ever appear unquoted
when they will not later be expanded.

  | The error message uses some sloppy language, but that's neither here nor 
there

I know, and I could easily fix that, but while:

  | -- this is a perfectly valid script:
  |
  | cat <<$PATH
  | hello
  | $PATH

it is, no-one ever writes things like that in real code (so no-one
ever sees the error message in practice).

I have been looking for a reasonable way to fix this for a while,
without sacrificing the advantages gained by doing things this way.
Nothing reasonable has occurred to me yet, and it is a very low
priority issue.

  | You're certainly free to consider it a bug,

I do.

  | and not to consider the compatibility concerns that inspired its inclusion 
in
  | bash in the first place,

illusory compat issues.  I have no idea what inspired this initially, but
my guess would be a code bug no-one noticed.

  | but let's not pretend that this is something that
  | died out a long time ago.

the implementations exit, the need for it does no.

  | > Further, no-one (not anyone I
  | > have ever seen) deliberately relies upon the here doc ending at EOF, not
  | > even if a here doc is in a -c command string or similar).
  |
  | You never really know, do you?

So just how many complaints do you get about the warning message?
"ksh doesn't complain wbout this, why does bash?"

I have seen zero of those on bug-bash.  My guess is that if wnyone
does ever see it they simply fix their code, there's never a need
to do otherwise.

  | Which instance of `ola"'? The first or the second?

The first.

  | This cannot be a serious question unless you mean the second.

It is a very serious question, but not as to what should hppen
but how the standard needs to describe it.

  | The delimiter is a `word', and we both know what a shell word is.

yes, but that's irrelevant, it is merely a coincidence here that
the newline in question occurs in the delimiter.
Another example
cat < file ; echo "abc
def
EOF
ghi" \
EOF
EOF
What is the here doc, and what does echo say.

The first newline after the << is the one after abc.
Do remember that here doc data collection is entirely a
lexical issue, that's why tgey dot appear anywhere in
the sh grammar.

  | In other messages from both of us, we agree that
  | the delimiter is "ola\nI,\nola\nola". The here document body starts at the
  | next newline following that delimiter.

Sure, this was not the best example of the problem (with the std)

  | The newline after the delimiter is both, but sure, newline token would
  | probably work better.

The example above shows the issue better.  That includes the \newline
which can only be a \ newline because the 2nd char there is a newline,
and that has to be seen at the lexical level.

  | So it doesn't read `lines' in the POSIX sense? Huh. Who knew?

For this.  No.   An extension.  One that comes for feee.

kre



Re: Unclosed quotes on heredoc mode

2021-11-18 Thread Chet Ramey
On 11/17/21 7:01 PM, Robert Elz wrote:
> Date:Wed, 17 Nov 2021 15:47:37 -0500
> From:Chet Ramey 
> Message-ID:  <420281e7-f3c4-8054-d390-9378080c2...@case.edu>
> 
>   | Every modern shell uses `$PATH' as the here-document delimiter
> 
> Depends what you call modern shells - some ash derived shells (at least)
> don't, because they parse the $PATH into an internal form (in all words
> where that makes sense, before knowing what the word is to be used for)
> and then cannot match that properly.   While that isn't actually expanding
> the word, it still makes things fail badly.

Yeah, that's a bug. But it's probably baked in.

> But:
> 
> [D] sh-current $ cat foo <<$PATH
> sh: 80: Syntax error: Illegal eof marker for << redirection
> 
> at least we error out when the user tries, not just fail to ever
> find the end of the here doc.

OK, that's clearly a bug. Is this specific to the literal string `$PATH',
or are there more things that trigger it? The error message uses some
sloppy language, but that's neither here nor there -- this is a perfectly
valid script:

cat <<$PATH
hello
$PATH

that should echo `hello'.


>   | > First, the EOF should not work, that's a bash bug (IMO) - that should
>   | > generate an error, not just a warning.
>   |
>   | It's not. The historical shells used for the basis of the POSIX standard
> 
> I didn't say it was a standards violation, I said it was a bug.
> That the same bug exists in some other ancient shells isn't a justification.

"Some other ancient shells?" Like dash, or (the current and actively-
developed) ksh93, or the FreeBSD sh, or zsh? The ones I listed in the part
of my message you chopped? You're certainly free to consider it a bug, and
not to consider the compatibility concerns that inspired its inclusion in
bash in the first place, but let's not pretend that this is something that
died out a long time ago.

> 
> Further, no-one (not anyone I
> have ever seen) deliberately relies upon the here doc ending at EOF, not
> even if a here doc is in a -c command string or similar).

You never really know, do you?

>   | > OK, here we have another of the oddities of shell syntax.   The spec
>   | > says that a here document starts at the next newline after the << 
> operator,
>   | > but that's not what it really means. 
>   |
>   | I think the intent there is that the here document starts at the next
>   | newline after the delimiter.
> 
> You mean at the newline after the ola" in the example given?   Really?

Which instance of `ola"'? The first or the second? This cannot be a serious
question unless you mean the second. The delimiter is a `word', and we both
know what a shell word is. In other messages from both of us, we agree that
the delimiter is "ola\nI,\nola\nola". The here document body starts at the
next newline following that delimiter. If you want to reject it because the
delimiter contains a newline, that's fine, but let's also not pretend we
don't know what the delimiter is.

> Surely it must mean newline token, not newline character, mustn't it?

The newline after the delimiter is both, but sure, newline token would
probably work better.

> (Even then, there are more, messier, issues, which I know you're aware of;
> if we could make it as simple as "after the lexically next newline token"
> it would make everything much simpler - that's what it should be.)
> 
>   | > Being able to do that (include embedded newline characters
>   | > do in some other shells).
>   |
>   | I couldn't fine one where it does.
> 
> They work in (at least) the NetBSD shell, FreeBSD too I expect, since the
> two use essentially the same mechanism for recognising the end of the
> here doc -- (effectively) after a newline, read chars (from a buffer) one
> at a time, comparing them with the end delimiter, until either there is a
> match failure, or until the end of the end delimiter (after which one more
> char from the buffer is compared to \n).   (Add tab stripping as required).

So it doesn't read `lines' in the POSIX sense? Huh. Who knew?

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Robert Elz
Date:Wed, 17 Nov 2021 15:47:37 -0500
From:Chet Ramey 
Message-ID:  <420281e7-f3c4-8054-d390-9378080c2...@case.edu>

  | Every modern shell uses `$PATH' as the here-document delimiter

Depends what you call modern shells - some ash derived shells (at least)
don't, because they parse the $PATH into an internal form (in all words
where that makes sense, before knowing what the word is to be used for)
and then cannot match that properly.   While that isn't actually expanding
the word, it still makes things fail badly.

But:

[D] sh-current $ cat foo <<$PATH
sh: 80: Syntax error: Illegal eof marker for << redirection

at least we error out when the user tries, not just fail to ever
find the end of the here doc.


  | and checks for the delimiter before any part of the process that expands
  | the lines in the here-document body.

That yes, I agree, everyone does that.

  | > First, the EOF should not work, that's a bash bug (IMO) - that should
  | > generate an error, not just a warning.
  |
  | It's not. The historical shells used for the basis of the POSIX standard

I didn't say it was a standards violation, I said it was a bug.
That the same bug exists in some other ancient shells isn't a justification.

Blindly taking the whole remainder of the script as a here document, and
processing it as if that were the author's intent, just because they made
a typo somewhere, is simply irrational.   Further, no-one (not anyone I
have ever seen) deliberately relies upon the here doc ending at EOF, not
even if a here doc is in a -c command string or similar).

  | Bash at least warns you about it.

Yes, better than some, but not as good as it should be.

  | > OK, here we have another of the oddities of shell syntax.   The spec
  | > says that a here document starts at the next newline after the << 
operator,
  | > but that's not what it really means. 
  |
  | I think the intent there is that the here document starts at the next
  | newline after the delimiter.

You mean at the newline after the ola" in the example given?   Really?
Surely it must mean newline token, not newline character, mustn't it?
(Even then, there are more, messier, issues, which I know you're aware of;
if we could make it as simple as "after the lexically next newline token"
it would make everything much simpler - that's what it should be.)

  | > Being able to do that (include embedded newline characters
  | > do in some other shells).
  |
  | I couldn't fine one where it does.

They work in (at least) the NetBSD shell, FreeBSD too I expect, since the
two use essentially the same mechanism for recognising the end of the
here doc -- (effectively) after a newline, read chars (from a buffer) one
at a time, comparing them with the end delimiter, until either there is a
match failure, or until the end of the end delimiter (after which one more
char from the buffer is compared to \n).   (Add tab stripping as required).

On no match, reset the buffer pointer back to where all this started,
and continue reading lines into the here doc.  When the end delim is
recognised, the here doc is complete after the last \n that was added to
it, and regular shell input continues after the \n from the buffer which
matched after the end delimiter.   What the chars are that match (including
more newlines, etc) is irrelevant, anything works (but no tab stripping
occurs after any intermediate newlines).

  | > Since bash doesn't allow end delimiter words that contain newlines to
  | > work, it should probably generate an error when you try to use one, that
  | > would have made things clear.
  |
  | See above.

Again, behaving irrationally when it would be trivial to detect the error
(even if a rare one) is poor design, and should be fixed.

kre




Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Chet Ramey
On 11/17/21 3:02 PM, Robert Elz wrote:

>   | bash-5.1$ cat << $PATH
> 
> 
>   | it should have terminated with the upper delimiter!
> 
> What do you consider the "upper delimiter" ?
> 
> This is one of the weirder aspects of shell syntax, and perhaps one
> of bash's oddities.

It's not. Every modern shell uses `$PATH' as the here-document delimiter
and checks for the delimiter before any part of the process that expands
the lines in the here-document body.

> First, the EOF should not work, that's a bash bug (IMO) - that should
> generate an error, not just a warning.

It's not. The historical shells used for the basis of the POSIX standard
(ksh88 and the SVR4 sh) silently allow EOF to terminate a here-document;
ksh93 preserves this behavior. Some of the common shells allow this as
well (e.g., dash, zsh and the version of the FreeBSD from a couple of years
ago when I last built it), some do not (e.g., mksh and the netbsd sh). Bash
at least warns you about it.

> 
>  Example:
>   |
>   | bash-5.1$
>   | bash-5.1$ cat << ola"
> 
> OK, here we have another of the oddities of shell syntax.   The spec
> says that a here document starts at the next newline after the << operator,
> but that's not what it really means. 

I think the intent there is that the here document starts at the next
newline after the delimiter.


> Being able to do that (include embedded newline characters
> in a "line") isn't required by the shell specification, and (it has been
> a while since I checked) I do not believe that those work in bash (they
> do in some other shells).

I couldn't fine one where it does.


> Since bash doesn't allow end delimiter words that contain newlines to
> work, it should probably generate an error when you try to use one, that
> would have made things clear.

See above.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread João Almeida Santos
Ok, got it. It makes sense now!
Thank you very much for your detailed explanation guys; now that I understand 
it, I’ll try to implement that on my mini shell.
It’s a bit too soon, but merry Christmas to you all!

Kind regards,
João Almeida Santos





Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Chet Ramey
On 11/17/21 10:33 AM, Robert Elz wrote:

> There are several (IMO)
> bugs in the way bash processes here documents, 

Such as?


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Chet Ramey
On 11/17/21 1:45 PM, João Almeida Santos wrote:
> No, it’s on the email...Anyway, here’s the text!
> 
> bash-5.1$ echo $PATH
> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/
> 
> bash-5.1$ cat << $PATH
>> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/
>> it should have terminated with the upper delimiter! but, bash does not seem 
>> to expand PATH.
>> $PATH
> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/
> it should have terminated with the upper delimiter! but, bash does not seem 
> to expand PATH.

The here document delimiter does not undergo any expansions except quote
removal, so the delimiter is the literal string `$PATH'. The lines of the
here-document undergo a different set of expansions, which happen after the
check for the delimiter is performed, which means that you need to have a
line that consists solely of `$PATH' to terminate the here-document (as you
discovered). I cannot see how you're going to be able to do anything useful
with this construct; it just seems too clever by (more than) half.

This is all in the bash documentation.


> ok, but now addressing the actual question. If I use unclosed quotes on 
> heredoc, I can't use 
> the given delimiter to end the heredoc, I end up having to use an EOF. 
> Example:
> 
> bash-5.1$
> bash-5.1$ cat « ola"
>> I,
>> ola""
>> ola"
>> ola

The delimiter is not what you think it is. The delimiter for a here-
document is a shell word (which can include quoted substrings), and after
it undergoes the appropriate quote removal, your delimiter is
"ola\nI,\nola\nola" (using C string notation).

Now, you're never going to be able to match this; it contains a newline.
When the shell constructs the here-document body, it reads individual lines
from the input source and, after removing the trailing newline, tries to
match them against the delimiter (and backslash doesn't work to quote the
newline). This will obviously never match a delimiter containing a newline.

Some shells (e.g., yash) choose to make this a syntax error. Bash does not.

> In the above example, I don't unterstand how to provide the wanted delimiter!

You simply cannot, not the way you specify it. If you really want to have
the double quote as part of the here-document delimiter, write it as

cat << ola\"

I can't imagine this being useful, either.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Robert Elz
Date:Wed, 17 Nov 2021 18:45:05 +
From:=?utf-8?Q?Jo=C3=A3o_Almeida_Santos?= 

Message-ID:  


  | No, it's on the email...

It wasn't, but some lists filter attachments (remove them) - this might be one.

  | bash-5.1$ echo $PATH
  | 
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/

That's fine, but having /usr/local/bin there 3 times isn't useful for
anything but wasting time.

  | bash-5.1$ cat << $PATH


  | it should have terminated with the upper delimiter!

What do you consider the "upper delimiter" ?

This is one of the weirder aspects of shell syntax, and perhaps one
of bash's oddities.

  | but, bash does not seem to expand PATH.

Which one are you concerned with?   It really should expand neither
reference to $PATH in that example - the word after << is just a string,
it isn't used anywhere except locating the end of the here doc data.
However, that word can sometimes be expanded (by some shells), whereas
all that should happen to it is quote removal.   Avoiding '$' in that
word is a very good idea.

The same is true of the '$PATH' that ends the here document, that's also
just a string - which is also never expanded (no-one should be expanding
that one, ever).

The only point of this string (where you're using $PATH) is as a marker
to show where the here document data ends, it does nothing else at all.
In some instances (where the data might contain just about anything) a
complicated end word (so it is unlikely to be a line in the here document
data) is needed, in other cases something simple like EOF or DONE or my
favourite: !  is all that is needed.

So
cat < I,
  | > ola""

That first " in that last string closes the opening " in the end delimiter
word.   The second " in that line starts a new quoted string, which is
still part of the same word (no separators have occurred)

  | bash-5.1$ cat << ola"
  | > I,
  | > ola""
  | > ola"

Now we have a closing " on for the opening (2nd) one on the previous
line, and the newline that follows that ends that word, and also becomes
the newline after which the here document data starts.  So now the end
delimiter word is complete, it is, as a C string:

"old\nI,\nola\nola"

The internal quotes have been removed, they do not form part of it,
the ones added here are just for clarity, and don't form part of the word.
The \n sequences represent actual newline characters, not the 2 character
sequence '\' 'n'.

That string is what you would need to use, on a line, to end the
here document.   Being able to do that (include embedded newline characters
in a "line") isn't required by the shell specification, and (it has been
a while since I checked) I do not believe that those work in bash (they
do in some other shells).

That means that (in standard shell required syntax, and probably in
bash) there is nothing you can possibly enter which will terminate that
here document correctly.   So an end of file, or interrupt (^C probably)
is your only choice.


  | In the above example, I don't unterstand how to provide the wanted
  | delimiter!

You cannot in bash, I believe, it is simply impossible.

  | I try to close the quote, in case it is needing it, but even with both
  | quotes, just one or none, it doesn't close...not even with a '\n' char

In shell, newlines do not end quoted strings, they just become a character
in the string.   The word that comes after the << cannot (in standard syntax)
contain a newline, so you cannot use an unpaired quote character in it
(unless it is itself quoted - and if you do that, nothing in the here doc
will be able to be expanded).

Since bash doesn't allow end delimiter words that contain newlines to
work, it should probably generate an error when you try to use one, that
would have made things clear.

If you wanted a quote character in the end delimiter word for some
reason, you could do:

cat << ola\"
hello
old"

and that should say "hello" on standard output, as the end delimiter there
is the 4 char string:
'o' 'l' 'a' '"'
but no expansions in the "hello" part can happen, because of the quoting
used in the end delimiter word.

kre




Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Greg Wooledge
On Wed, Nov 17, 2021 at 06:45:05PM +, João Almeida Santos wrote:
> bash-5.1$ cat << $PATH

That's not how a here-document is intended to be used.  A here-document
lets you drop a blob of text directly into your script and use that as
standard input for some command, without needing to store the text in a
separate file.

Here's an example of how it's often used:

usage() {
cat << 'EOF'
usage: myprogram [-abcxyz] [-f inputfile]

Description of options:
  -a   all the things
  -b   make it bad
  ...
EOF
}


Using the contents of a variable as the sentinel to mark the end of
the here-document is not a good idea, and using $PATH specifically is
a VERY bad idea.

It kinds looks like you actually wanted a here-string, not a here-document,

IFS=: read -ra paths <<< "$PATH"

Was that what you were trying to do?  To use the content of PATH as your
input?  That's a here-string and uses the <<< operator, not <<.



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread João Almeida Santos
No, it’s on the email...Anyway, here’s the text!

bash-5.1$ echo $PATH
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/

bash-5.1$ cat << $PATH
> /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/
> it should have terminated with the upper delimiter! but, bash does not seem 
> to expand PATH.
> $PATH
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin/:/usr/local/bin/:/usr/local/bin/
it should have terminated with the upper delimiter! but, bash does not seem to 
expand PATH.
bash-5.1$

ok, but now addressing the actual question. If I use unclosed quotes on 
heredoc, I can't use 
the given delimiter to end the heredoc, I end up having to use an EOF. Example:

bash-5.1$
bash-5.1$ cat « ola"
> I,
> ola""
> ola"
> ola
>
bash: warning: here-document at line 31 delimited by end-of-file (wanted 'ola
')
ola""
ola"
ola

In the above example, I don't unterstand how to provide the wanted delimiter! I 
try to close the quote, in case it is needing it, but even with both quotes, 
just one or none, it doesn't close...not even with a '\n' char



Kind regards,
João Almeida Santos

> On 17 Nov 2021, at 18:34, Alex fxmbsw7 Ratchev  wrote:
> 
> u forgot to attach the picture .. ?
> 
> On Wed, Nov 17, 2021, 19:31 João Almeida Santos  > wrote:
> Thank you for your reply Robert and Lawrence!
> 
> I understand the description alone is hard to follow, so I think the image 
> below should make it clearer. Otherwise let me know!
> 
> 
> Kind regards,
> João Almeida Santos



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Greg Wooledge
On Wed, Nov 17, 2021 at 06:30:08PM +, João Almeida Santos wrote:
> Thank you for your reply Robert and Lawrence!
> 
> I understand the description alone is hard to follow, so I think the image 
> below should make it clearer. Otherwise let me know!
> 
> 
> Kind regards,
> João Almeida Santos

No attachment made it through.  Which is probably a good thing.

We don't want to see an image of text.  Simply paste the text from
your terminal into the body of your email.  Or if you have a script
which is in a file, you can attach the script.  It should be small,
no larger than necessary to demonstrate the problem you're having -- say,
10 to 20 lines tops.

If you aren't programming on a terminal (because you're running Microsoft
Windows or something), then this may be a culture shock to you, but
it'll be worth figuring these things out in the long run.  Text beats
images every time when it comes to programming questions.



Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Alex fxmbsw7 Ratchev
u forgot to attach the picture .. ?

On Wed, Nov 17, 2021, 19:31 João Almeida Santos 
wrote:

> Thank you for your reply Robert and Lawrence!
>
> I understand the description alone is hard to follow, so I think the image
> below should make it clearer. Otherwise let me know!
>
>
> Kind regards,
> João Almeida Santos


Re: Unclosed quotes on heredoc mode

2021-11-17 Thread João Almeida Santos
Thank you for your reply Robert and Lawrence!

I understand the description alone is hard to follow, so I think the image 
below should make it clearer. Otherwise let me know!


Kind regards,
João Almeida Santos

Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Robert Elz
Date:Wed, 17 Nov 2021 12:35:42 +
From:=?utf-8?Q?Jo=C3=A3o_Almeida_Santos?= 

Message-ID:  


  | While testing the heredoc mode, I realized that the $ is not
  | interpreted as variable expansion.

It depends how you set up the heredoc, please give an example of
what you're testing (one which is not doing what you expect, and
indicate exactly what you do expect to happen).

  | But the reason why I'm emailing you is that I whenever I tried to
  | use an unclosed quote on heredoc, it doesn't seem to handle well
  | it never finishes the heredoc.

Again, an example is needed to understand what you're doing.

  | I tried \n for paragraph, verbatim inserting enter,
  | closing quotes on the next line, ... don't know what else to test.

Without seeing what you are actually doing, where the missing quote is
for example (in the heredoc data, or in the word that is after the << ?)
it is hard to suggest anything.

  | I'm on version 5.1 btw.

Show the result of "echo $BASH_VERSION" to provide this info.

  | Is this an expected behavior or a bug?

Impossible to say without more information.   There are several (IMO)
bugs in the way bash processes here documents, whether you're encountering
one of those, something new, or just not understanding the way it is
intended to work is impossible to say with the information you've given.

kre




Re: Unclosed quotes on heredoc mode

2021-11-17 Thread Lawrence Velázquez
On Wed, Nov 17, 2021, at 7:35 AM, João Almeida Santos wrote:
> I’m a programming student currently on 42 School in Lisbon, and one of 
> our projects is to create a minishell, and to mimic the behavior of 
> bash.

Nice!

> While testing the heredoc mode, I realized that the $ is not 
> interpreted as variable expansion. That’s interesting.

Are you talking about your project or bash?  In the latter (and
related shells), expansion is only suppressed in a here-document
if, when the delimiter is specified, it is at least partially quoted.

$ cat < But the reason why I’m emailing you is that I whenever I tried to use 
> an unclosed quote on heredoc, it doesn’t seem to handle well…it never 
> finishes the heredoc.
> I tried \n for paragraph, verbatim inserting enter, closing quotes on 
> the next line, …don’t know what else to test.

This description is a bit hard to follow.  Could you provide a small
bit of code that demonstrates the issue?

-- 
vq



Unclosed quotes on heredoc mode

2021-11-17 Thread João Almeida Santos
Hello,

First of all thank you for doing great (and free) software!
I’m a programming student currently on 42 School in Lisbon, and one of our 
projects is to create a minishell, and to mimic the behavior of bash. 
While testing the heredoc mode, I realized that the $ is not interpreted as 
variable expansion. That’s interesting.
But the reason why I’m emailing you is that I whenever I tried to use an 
unclosed quote on heredoc, it doesn’t seem to handle well…it never finishes the 
heredoc.
I tried \n for paragraph, verbatim inserting enter, closing quotes on the next 
line, …don’t know what else to test.
I’m on version 5.1 btw.
Is this an expected behavior or a bug?

Thank you!

Kind regards,
João Almeida Santos