On 6/1/23 10:20, Chet Ramey wrote: > On 5/29/23 12:39 PM, Rob Landley wrote: > >> But I'm still left with this divergence: >> >> $ ./sh -c 'echo abc\' >> abc >> $ bash -c 'echo abc\' >> abc\ > > The backslash doesn't escape anything, EOF delimits the token and command, > and the backslash remains in place for echo to process (or not).
To me this is all part of line continuation logic. My tokenizer is returning "needs another line to continue" as part of quote processing, and backslash is basically a single character quote, which yours is doing too: $ echo \ | wc -c 2 $ echo | wc -c 1 But escaping a _newline_ is funny in that it glues lines together instead of creating a command line argument out of the result, which means it has to be special cased and obviously I'm special casing it wrong, but the special case has multiple nonobvious features. I think part of it is that my tokenizer removes whitespace between tokens, and you're not doing that until later? (You're doing more passes over the data than I am, my code tries to do all the work each pass can do so it's not repeating itself. I had a problem that variable expansion and redirect are the same pass in my code, and different passes in yours, which leads to me being unable to produce quite the same error messages you do in a couple places...) In general, line continuation priority isn't always obvious to me until I've determined it experimentally: $ cat << EOF; if true > hello > EOF > then echo also; fi hello also $ if cat << EOF > hello > EOF > then echo true; fi hello true $ if true; then cat << EOF > hello > EOF > echo next > fi hello next I'm trying to have tests for everything, but there are a number of corner cases... >> Which is annoyingly magic because: >> >> $ bash << 'EOF' >> > echo abc\ >> > EOF >> abc > > So think about this in two pieces: what the here-document does to generate > the input to the shell, and what the shell does with it. The way I'd done it is the HERE document doesn't generate input, the funky redirect _requests_ additional input, which is all basically the line continuation logic where it can't proceed to the "can we actually run this now" logic because it hasn't yet got a complete thought. I keep keep calling parse_line() with the next line of input until it returns zero, at which point it can call run_line() on the accumulated data structure it got parsed into. > Since the here-document delimiter is quoted, the `outer' shell doesn't do > anything special with the backslash-newline. If it were not quoted, the > backslash-newline would be removed, and the EOF would not delimit the > here-document. Indeed. I need to make sure I have a test for that in tests/sh.test... > So the shell is supplied input on file descriptor 0 that consists of a > single line (which ends with a newline): > > echo abc\ That was the intent, yes. > which the shell reads. Since nothing is quoted, the backslash-newline gets > removed, the shell reads EOF and delimits the token and command, and echo > gets "abc" as its argument. I thought that "there's a newline at the end of the line, which the \ is escaping" was relevant, but apparently that's only true for -c. >> And also: >> >> $ echo 'echo abc\' > blah >> $ cat blah >> echo abc\ >> $ bash ./blah >> abc > > Same thing, the file ends with a backslash-newline that gets removed, EOF > delimits the token and command, echo gets "abc" and does the expected > thing. File input and stdin were behaving the same, but -c wasn't. Hence me going "is it the newline?" later on... >> So... do I special case -c here or what? > > What's the special case? EOF (or EOS, really) always delimits tokens when > you're using -c command. Just the same as if you had a file that didn't > end with a newline. Except when I have a file that doesn't end with a newline, a trailing \ on the last line is removed. That was one of the later tests. >> >> Aha! >> >> $ bash -c $'echo abc\\' >> abc\ > > There's no difference between this and 'echo abc\'. Indeed, but it's phrased that way for comparison with the next call. This one has no newline at the end of the -c input, but is otherwise identical. (Given how the shell gratuitously strips trailing newlines from "$BLAH" and such, $'' is almost unique in NOT having them stripped...) Anyway, I'd previously thought -c input wasn't special, in that you can feed multiple lines into -c and they get parsed as multiple lines: $ bash -c $'echo one\necho two' one two $ bash -c $'cat << EOF\nhello\nEOF' hello Which is why in my implementation I'm just feeding them all into int do_source(char *name, FILE *ff) with calls to fdopen() or fmemopen() when I want to feed it various types of input. >> $ bash -c $'echo abc\\\n' >> abc > > The backslash-newline gets removed. That always happens, regardless of > where the input is coming from. Yup, which is what led up to the next tests: >> >> So... >> >> $ echo -n 'echo abc\' | bash >> abc >> $ echo -n 'echo abc\' > blah >> $ bash ./blah >> abc > > This looks inconsistent at first glance, I'll take a look. Which is where I got confused, yes. If -c doesn't end with a newline, then the \ persists, but when stdin or file input don't end with a newline, the trailing backslash is still removed even when it's the last byte of the input and is thus has nothing to escape. >> Nope, that's not it either, -c is still magic even when the file input hasn't >> got a newline. > > What is `magic' about it? Input via -c is the only context in which a final \ is retained. Even when it's the last byte of input, FILE and stdin still strip the trailing backslash. Rob (Once again, this is _probably_ me trying to match bash's behavior too closely, but in the absence of a "bash specification"...) _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net