regex string ">(...)" in [[ ]] command recognize as process substitution
## Machine: x86_64 OS: linux-gnu Compiler: gcc Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wall uname output: Linux EliteBook 5.19.0-23-generic #24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnu Bash Version: 5.2 Patch Level: 2 Release Status: release ## I don't know if this is really a bug or not. but the regex string ">(...)" in [[ ]] command is recognized as process substitution when using "( )" parentheses. # this all works fine # val="delete unset" bash$ regex='(.*) <[^>]*> (.*)' bash$ [[ $val =~ $regex ]] && echo yes yes bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes yes bash$ [[ $val =~ ((.*) <[^>]*> (.*)) ]] && echo yes yes remove spaces in regex string ## bash$ regex='(.*)<[^>]*>(.*)' bash$ [[ $val =~ $regex ]] && echo yes yes bash$ [[ $val =~ (.*)\<[^\>]*\>(.*) ]] && echo yes yes # this is an error # [[ ]] command recognizes ">(.*)" as process substitution. bash$ [[ $val =~ ((.*)<[^>]*>(.*)) ]] && echo yes # Error ! bash$ .*: command not found ^C # if i escape \> then the error goes away bash$ [[ $val =~ ((.*)<[^>]*\>(.*)) ]] && echo yes # Ok yes The Second problem # This only happens in the terminal. # 1. intentionally makes an error by removing escape "\>" to ">" bash$ [[ $val =~ (.*)\ \<[^\>]*>\ (.*) ]] && echo yes bash: syntax error in conditional expression: unexpected token `>' # 2. fixed the error with \> escape, but the error continues bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes bash: syntax error near unexpected token `$val' # 3. On the second try, the error goes away. bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes yes The third problem # This also happens only in the terminal. but very unexpectedly happens # 1. command executed successfully bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes yes # 2. but the output of the ${BASH_REMATCH[1]} variable is incorrect. bash$ echo ${BASH_REMATCH[1]} ${ # or something like "${BASH_RE" bash$ echo ${BASH_REMATCH[1]} ${ # 3. if i try again, it outputs normally. bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes yes bash$ echo ${BASH_REMATCH[1]} delete
Re: Multiline editing breaks if the previous output doesn't end in newline
31 Ekim 2022 Pazartesi tarihinde Greg Wooledge yazdı: > > There's no 100% portable way to determine where the cursor is. Pity > Shells like zsh that show a special symbol in these cases use a hack > to do so. There's a good explanation in this answer: > > https://unix.stackexchange.com/questions/167582/why-zsh- > ends-a-line-with-a-highlighted-percent-symbol#answer-302710 > Thanks for the link. The hack is clever but the result is ugly. I'd much prefer the current behavior of bash. -- Oğuz
Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
31 Ekim 2022 Pazartesi tarihinde Martin D Kealey yazdı: > > With the exception of the !(LIST) negation, there's a direct > correspondence between extglob and any other regex format. Translating > between them is trivial. > If we use the standard POSIX BRE or ERE, then there's no additional code to > ship; it's included as part of the OS. The hard part is what to do with > (!LIST), which was the point of my previous post. > That'd be clunkier than what we already have. Bash targets many platforms and it'd have to target as many regex engines if it were to translate extglobs to posix regexes. You can't expect all of them to be compatible with each other, and they are not. So, if we wish to translate extglobs to regexes and have them work regardless of the platform, the easiest way forward is to adopt a third party regex engine; about which I said enough in my previous email. The problem is that it DOESN'T work fine. In practice people encounter > abysmally slow extglob matching. > *when matching against a huge string. Which is rare in my experience, but of course should be taken into consideration if there are multiple bug reports; I didn't say anything against that. -- Oğuz
Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
I'm top-quoting this because the entire response below seems to be predicated on a misconception, or perhaps several misconceptions. Exactly NONE of my suggestions involves expanding the Shell language. Users would continue to write extglob exactly as they do now, and they would remain blissfully ignorant of the regex engine underneath. The "translation" is entirely hidden from them. Extglob is amenable to compilation into a FSA, which leads to the conclusion that it's functionally a form of regular expression, even if its syntax is rather different from regexes that people are familiar with. This is why I said that writing our own regular/linear extglob implementation would be equivalent to writing a regex compiler and engine. It's also why I think the shortest route to having a working implementation would be to write a translation layer from extglob to one of the existing RE formats. (The only option I suggested that would modify the shell language would be the option to remove the !(LIST) negation; I include that for completeness, I wouldn't actually recommend it.) The idea that "these are only technical details that nobody cares about" is just wrong. There have been bugs filed because the performance is not just "a bit slow", but "slow by many orders of magnitude". If discussion of implementation details is out of scope for the bash-bug mailing list, is there a bash-dev list, or similar, where they should be discussed? With the exception of the !(LIST) negation, there's a direct correspondence between extglob and any other regex format. Translating between them is trivial. If we use the standard POSIX BRE or ERE, then there's no additional code to ship; it's included as part of the OS. The hard part is what to do with (!LIST), which was the point of my previous post. -Martin On Sun, 30 Oct 2022 at 23:35, Oğuz İsmail Uysal wrote: > On 10/30/22 3:25 PM, Martin D Kealey wrote: > > How much faster do you think it can be made? > I don't know, irrelevant though. > > The problem is not that individual steps are slow, but rather [...] These are technical details; no user cares about them. > > The purpose of my suggestions was to /minimize/ the complexity that > becomes part of Bash's codebase [...] > I meant complexity of the language, not the codebase. > > In my opinion "make the existing extglob code faster" is a wasted effort > if it doesn't get us to "run in at-most quadratic time" and preferably "run > in regular (linear) time", and so that basically amounts to "write our own > regex state machine compiler and regex engine". This is a non-trivial task, > and would fairly obviously add > > *more* complex code into Bash's codebase than any of my suggested > > alternatives. > extglobs are already a part of the bash language. All of your suggested > alternatives involve expanding the language in question. No. Just No. That's why I > disagree with all of them. > > > (Even my options of "postprocess the codebase" or "modify an existing > > regex compiler" would leave their execution components untouched; only > > the compilation phase would be modified, and a modified regex compiler > > would at least stand a chance of existing as a stand-alone library > > project.) > If you mean bash should start shipping a huge library like pcre for > solving an edge case, I don't think that's reasonable at all; why take > on such a burden when you already have something that works fine in > practice? > Using PCRE was only one option, and not my preferred one. POSIX RE is provided by the standard C library, we wouldn't have to ship anything. The problem is that it DOESN'T work fine. In practice people encounter abysmally slow extglob matching.
Re: Multiline editing breaks if the previous output doesn't end in newline
On Mon, Oct 31, 2022 at 03:40:34AM +0200, Oğuz wrote: > > Option B: Fix the line editor to take into account when the > > prompt doesn't start at column 0. > > > > > Yeah, or add a new prompt sequence (e.g. \N) that prints a newline only if > the cursor is not at column 0. There's no 100% portable way to determine where the cursor is. Shells like zsh that show a special symbol in these cases use a hack to do so. There's a good explanation in this answer: https://unix.stackexchange.com/questions/167582/why-zsh-ends-a-line-with-a-highlighted-percent-symbol#answer-302710
Re: Multiline editing breaks if the previous output doesn't end in newline
28 Ekim 2022 Cuma tarihinde Albert Vaca Cintora yazdı: > > Option A: If the previous command doesn't end in a newline, > add a newline manually. This is what most shells do. This sounds wrong. How are you going to know if the previous command ends in a newline or not then? > Option B: Fix the line editor to take into account when the > prompt doesn't start at column 0. > > Yeah, or add a new prompt sequence (e.g. \N) that prints a newline only if the cursor is not at column 0. -- Oğuz
Re: Multiline editing breaks if the previous output doesn't end in newline
On Sun, Oct 30, 2022, 23:01 Dennis Williamson wrote: > > > On Sun, Oct 30, 2022 at 4:41 PM Alex fxmbsw7 Ratchev > wrote: > >> >> >> i coded a files tree to bash code via gawk reading and printing bash code >> i did noeol no newline at end >> logically , cause , who wants var='from file\n' >> >> > >> > > Because command substitution strips trailing newlines? > no , sir i did simple files-in-dirs to bash-code i cant ( i did ) generate bash via bash , but it was too slow the current uses gawk to generate project bash code in one run with find \0 as input for , to file , or eval , source .. my point with the newlines i came across when processing the files 1:1 , i think it began with mapfile usage but gawk does as well there is to me , not knowing end newline(s) count , only a ( too ) fatal disability it also makes data processing , a pain to me the ansi or whatever newline ending rule is , as i try to say , nothing profitable what i mean in my example is when normally writing a file in vim , it ends with the big time ignored newline say one has vars/foo with content 'bar' my code would produce , along var stacking , foo='bar ' to me , as data cruncher , one to one data preservance is must so i changed vim settings to skip this ending newline there is also another case if use for exactness that is data stacking in sequencial files sometimes \n sometimes not hopes for good , /peace $ echo -e 'foo\n\n\n' > foo > > > > $ s=$(echo -e 'foo\n\n\n') > $ declare -p s > declare -- s="foo" > > No gyrations needed. > -- > Visit serverfault.com to get your system administration questions > answered. >
Re: Multiline editing breaks if the previous output doesn't end in newline
On Sun, Oct 30, 2022 at 4:41 PM Alex fxmbsw7 Ratchev wrote: > > > i coded a files tree to bash code via gawk reading and printing bash code > i did noeol no newline at end > logically , cause , who wants var='from file\n' > > > > Because command substitution strips trailing newlines? $ echo -e 'foo\n\n\n' foo $ s=$(echo -e 'foo\n\n\n') $ declare -p s declare -- s="foo" No gyrations needed. -- Visit serverfault.com to get your system administration questions answered.
Re: Multiline editing breaks if the previous output doesn't end in newline
On Sun, Oct 30, 2022, 21:21 Albert Vaca Cintora wrote: > On Sun, Oct 30, 2022 at 7:54 AM Martin D Kealey > wrote: > > > > This sounds like a bug in whatever is producing the output. POSIX text > files have a newline terminating every line; that description includes > streams going through pipes and tty devices if they purport to be text. > > > > There are many reasons why one could end up with text in a terminal > that doesn't end in a newline. A couple of them are: > - An app is killed in the middle of writing its output (eg: because of > a sigterm/sigkill). > - A file that isn't a POSIX text file is printed to the terminal. > > So I think this should still be handled by bash. > i coded a files tree to bash code via gawk reading and printing bash code i did noeol no newline at end logically , cause , who wants var='from file\n' >
Re: Multiline editing breaks if the previous output doesn't end in newline
On Sun, Oct 30, 2022 at 7:54 AM Martin D Kealey wrote: > > This sounds like a bug in whatever is producing the output. POSIX text files > have a newline terminating every line; that description includes streams > going through pipes and tty devices if they purport to be text. > There are many reasons why one could end up with text in a terminal that doesn't end in a newline. A couple of them are: - An app is killed in the middle of writing its output (eg: because of a sigterm/sigkill). - A file that isn't a POSIX text file is printed to the terminal. So I think this should still be handled by bash.
Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
On 10/30/22 3:25 PM, Martin D Kealey wrote: How much faster do you think it can be made? I don't know, irrelevant though. The problem is not that individual steps are slow, but rather that it takes at least a higher-order-polynomial number of steps, possibly more (such as exponential or factorial). Speeding up the individual steps will make no practical difference, while pin-hole optimisations may dramatically speed up some common cases, but still leave the most general cases catastrophically slow. These are technical details; no user cares about them. The purpose of my suggestions was to /minimize/ the complexity that becomes part of Bash's codebase, while leaving as few pathological cases as possible - preferably none. I meant complexity of the language, not the codebase. In my opinion "make the existing extglob code faster" is a wasted effort if it doesn't get us to "run in at-most quadratic time" and preferably "run in regular (linear) time", and so that basically amounts to "write our own regex state machine compiler and regex engine". This is a non-trivial task, and would fairly obviously add *more* complex code into Bash's codebase than any of my suggested alternatives. extglobs are already a part of the bash language. All of your suggested alternatives involve expanding the language in question. That's why I disagree with all of them. (Even my options of "postprocess the codebase" or "modify an existing regex compiler" would leave their execution components untouched; only the compilation phase would be modified, and a modified regex compiler would at least stand a chance of existing as a stand-alone library project.) If you mean bash should start shipping a huge library like pcre for solving an edge case, I don't think that's reasonable at all; why take on such a burden when you already have something that works fine in practice?
Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
On Sun, Oct 30, 2022, 11:33 Oğuz wrote: > 30 Ekim 2022 Pazar tarihinde Martin D Kealey > yazdı: > > > > So the options would seem to be: > > (a) prohibit inversions (you get to pick EITHER extglob or rexglob, not > > both); > > (b) bypass convert-to-regex when inversions are present; > > (c) use PCRE or Vim RE, which already support negations (though not in > the > > same form); note that these do not have linear ("regular") time > performance > > in any but the trivial cases; > > (d) compute the "inverse" regex for a given negation (this may require > Vim > > RE, see below); > > (e) post-process the compiled regex (this would be highly dependent on > the > > specific RE implementation); > > (f) pick an existing regex engine, and add the necessary logic to handle > > negations and conjunctions (see below); > > > > Or > (g) make the existing extglob code faster and avoid introducing more > complexity into the shell. > > Which, in my opinion, is the easiest and most realistic option. > last three times i bugged about glob ( like reuse groupings ) it sounded 'do-it-yourself' back only i would , have done , if i could ( .c ) -- > Oğuz >
Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
30 Ekim 2022 Pazar tarihinde Martin D Kealey yazdı: > > So the options would seem to be: > (a) prohibit inversions (you get to pick EITHER extglob or rexglob, not > both); > (b) bypass convert-to-regex when inversions are present; > (c) use PCRE or Vim RE, which already support negations (though not in the > same form); note that these do not have linear ("regular") time performance > in any but the trivial cases; > (d) compute the "inverse" regex for a given negation (this may require Vim > RE, see below); > (e) post-process the compiled regex (this would be highly dependent on the > specific RE implementation); > (f) pick an existing regex engine, and add the necessary logic to handle > negations and conjunctions (see below); > Or (g) make the existing extglob code faster and avoid introducing more complexity into the shell. Which, in my opinion, is the easiest and most realistic option. -- Oğuz
Re: bash "extglob" needs to upgrade at least like zsh "kshglob"
On Sat, 29 Oct 2022 at 22:15, Greg Wooledge wrote: > On Sat, Oct 29, 2022 at 04:50:00PM +1100, Martin D Kealey wrote: > > This seems like a good reason to simply translate extglobs into regexes, > > which should run in linear time, rather than put effort into building and > > debugging a parallel implementation. > > This isn't straightforward, because of the !(list) feature of extglob. > There's no analogous construct for that in standard regexes. > That's true. and my "simply" is understating the complexity of the task, but it's not impossible. There was a recent discussion about this very point on irc://libera.chat#Bash where it was pointed out that supporting the !(list) style of inversions in the regex compiler is simply a matter of inverting the "success" or "failure" indications attached to the states when compiling a regex, which means that it works correctly when these constructs are nested. So the options would seem to be: (a) prohibit inversions (you get to pick EITHER extglob or rexglob, not both); (b) bypass convert-to-regex when inversions are present; (c) use PCRE or Vim RE, which already support negations (though not in the same form); note that these do not have linear ("regular") time performance in any but the trivial cases; (d) compute the "inverse" regex for a given negation (this may require Vim RE, see below); (e) post-process the compiled regex (this would be highly dependent on the specific RE implementation); (f) pick an existing regex engine, and add the necessary logic to handle negations and conjunctions (see below); A particular difficulty is handling !(!(X)|!(Y)) which by de Morgan's law translates into @(X) - meaning a target needs to match BOTH X and Y. Only Vim's Regex engine has direct support for that construct, and it exhibits non-linear timing when using it. A naïve implementation could easily require state table space that's exponential in the breadth of the conjunctions, meaning that the user could very easily write a rexglob that cannot be compiled with available memory. So there needs to be a fallback position: either fail, telling the user we cannot honour the linear speed guarantee, or "try anyway" with a procedure that could exhibit truly horrible time complexity. -Martin PS: I say "we" and "us" advisedly; let's not leave everything to Chet.
Re: Multiline editing breaks if the previous output doesn't end in newline
This sounds like a bug in whatever is producing the output. POSIX text files have a newline terminating every line; that description includes streams going through pipes and tty devices if they purport to be text. It's fairly simple to fix this by adding this to the end of your .bashrc (or the system /etc/bashrc): PS1='\[\e[1;7;31m\e[K<<\e[$((COLUMNS-8))C<<\r\e[m\e[2K\]'"$PS1" or you might prefer this version: _MissingEOL='<<< missing EOL ' # your choice of text here for (( t = COLUMNS<250 ? 1000 : COLUMNS*4; ${#_MissingEOL} < t ;)) do _MissingEOL=$_MissingEOL$_MissingEOL ; done PS1='\[\e[1;7;31m${_MissingEOL:0:COLUMNS}\r\e[m\e[2K\]'"$PS1" These will mark an incomplete line with a red chevron to highlight the erroneous output, and move the cursor to the correct position on the next line. If this makes copying and pasting just slightly awkward, that would prompt me to fix the broken program, or to remember to add «; echo» when running it. (You might want to make this contingent on TERM being a known ANSI-compatible terminal type, or verifying the output of tput.) Sadly, there is an increasing number of common new utilities that don't honour the POSIX requirements for a text stream, such as 'jq' in its "brief" mode; if you come across these, please file bug reports on them. If the maintainers push back, please point them at the POSIX standard. -Martin
Re: Subsequent Here Doc/String Never Evaluated in Process Substitution
On Fri, 28 Oct 2022 at 20:37, wrote: > Thank you for the awesome shell. I noticed the following after upgrading > from 5.1.16-3 to 5.2.2-2 on Fedora. It actually resulted in a minor > amount > of data loss. After fixing the attached file to remove the carriage returns, I was able to reproduce the fault using Bash v5.2.0 rc2 Running with « bash-5.2.0-r2 -x ConsecutiveHereDocStringBug », it appears that the "cat" on the line following the assignment to uS is subsumed into it, and then the redirection on the assignment is ignored. (I discovered this when I attempted to insert « printf HERE\\n >&2 » between the assignment and the cat, only to then see HERE\n: command not found.) Indeed, it appears that the command following the "do" always absorbs the first word from the second command, despite the line break between them. This does seem like a fairly serious issue if it's present in the 5.2.0 release. -Martin