regex string ">(...)" in [[ ]] command recognize as process substitution

2022-10-30 Thread Hyunho Cho
##

Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -flto=auto -ffat-lto-objects -flto=auto
-ffat-lto-objects -fstack-protector-strong -Wformat
-Werror=format-security -Wall
uname output: Linux EliteBook 5.19.0-23-generic #24-Ubuntu SMP
PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64
GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.2
Patch Level: 2
Release Status: release

##

I don't know if this is really a bug or not.
but the regex string ">(...)" in [[ ]] command is recognized as
process substitution
when using "( )" parentheses.

#  this all works fine #

val="delete  unset"

bash$ regex='(.*) <[^>]*> (.*)'
bash$ [[ $val =~ $regex ]] && echo yes
yes

bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes
yes

bash$ [[ $val =~ ((.*) <[^>]*> (.*)) ]] && echo yes
yes

  remove spaces in regex string  ##

bash$ regex='(.*)<[^>]*>(.*)'
bash$ [[ $val =~ $regex ]] && echo yes
yes

bash$ [[ $val =~ (.*)\<[^\>]*\>(.*) ]] && echo yes
yes

# this is an error
# [[ ]] command recognizes ">(.*)" as process substitution.
bash$ [[ $val =~ ((.*)<[^>]*>(.*)) ]] && echo yes  # Error !
bash$ .*: command not found
^C

# if i escape \> then the error goes away
bash$ [[ $val =~ ((.*)<[^>]*\>(.*)) ]] && echo yes # Ok
yes

  The Second problem  #

This only happens in the terminal.

# 1. intentionally makes an error by removing escape "\>" to ">"
bash$ [[ $val =~ (.*)\ \<[^\>]*>\ (.*) ]] && echo yes
bash: syntax error in conditional expression: unexpected token `>'

# 2. fixed the error with \> escape, but the error continues
bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes
bash: syntax error near unexpected token `$val'

# 3. On the second try, the error goes away.
bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes
yes

  The third problem  #

This also happens only in the terminal.
but very unexpectedly happens

# 1. command executed successfully
bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes
yes

# 2. but the output of the ${BASH_REMATCH[1]} variable is incorrect.
bash$ echo ${BASH_REMATCH[1]}
${ # or something like "${BASH_RE"

bash$ echo ${BASH_REMATCH[1]}
${

# 3. if i try again, it outputs normally.
bash$ [[ $val =~ (.*)\ \<[^\>]*\>\ (.*) ]] && echo yes
yes

bash$ echo ${BASH_REMATCH[1]}
delete



Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Oğuz
31 Ekim 2022 Pazartesi tarihinde Greg Wooledge  yazdı:
>
> There's no 100% portable way to determine where the cursor is.


Pity


> Shells like zsh that show a special symbol in these cases use a hack
> to do so.  There's a good explanation in this answer:
>
> https://unix.stackexchange.com/questions/167582/why-zsh-
> ends-a-line-with-a-highlighted-percent-symbol#answer-302710
>

Thanks for the link. The hack is clever but the result is ugly. I'd much
prefer the current behavior of bash.


-- 
Oğuz


Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

2022-10-30 Thread Oğuz
31 Ekim 2022 Pazartesi tarihinde Martin D Kealey 
yazdı:
>
> With the exception of the !(LIST) negation, there's a direct
> correspondence between extglob and any other regex format. Translating
> between them is trivial.
>
If we use the standard POSIX BRE or ERE, then there's no additional code to
> ship; it's included as part of the OS. The hard part is what to do with
> (!LIST), which was the point of my previous post.
>

That'd be clunkier than what we already have. Bash targets many platforms
and it'd have to target as many regex engines if it were to translate
extglobs to posix regexes. You can't expect all of them to be compatible
with each other, and they are not. So, if we wish to translate extglobs to
regexes and have them work regardless of the platform, the easiest way
forward is to adopt a third party regex engine; about which I said enough
in my previous email.

The problem is that it DOESN'T work fine. In practice people encounter
> abysmally slow extglob matching.
>

*when matching against a huge string. Which is rare in my experience, but
of course should be taken into consideration if there are multiple bug
reports; I didn't say anything against that.


-- 
Oğuz


Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

2022-10-30 Thread Martin D Kealey
I'm top-quoting this because the entire response below seems to be
predicated on a misconception, or perhaps several misconceptions.

Exactly NONE of my suggestions involves expanding the Shell language.
Users would continue to write extglob exactly as they do now, and they
would remain blissfully ignorant of the regex engine underneath. The
"translation" is entirely hidden from them.

Extglob is amenable to compilation into a FSA, which leads to the
conclusion that it's functionally a form of regular expression, even if its
syntax is rather different from regexes that people are familiar with. This
is why I said that writing our own regular/linear extglob implementation
would be equivalent to writing a regex compiler and engine. It's also why I
think the shortest route to having a working implementation would be to
write a translation layer from extglob to one of the existing RE formats.

(The only option I suggested that would modify the shell language would be
the option to remove the !(LIST) negation; I include that for completeness,
I wouldn't actually recommend it.)

The idea that "these are only technical details that nobody cares about" is
just wrong.
There have been bugs filed because the performance is not just "a bit
slow", but "slow by many orders of magnitude".

If discussion of implementation details is out of scope for the bash-bug
mailing list, is there a bash-dev list, or similar, where they should be
discussed?

With the exception of the !(LIST) negation, there's a direct correspondence
between extglob and any other regex format. Translating between them is
trivial.
If we use the standard POSIX BRE or ERE, then there's no additional code to
ship; it's included as part of the OS. The hard part is what to do with
(!LIST), which was the point of my previous post.

-Martin

On Sun, 30 Oct 2022 at 23:35, Oğuz İsmail Uysal 
wrote:

> On 10/30/22 3:25 PM, Martin D Kealey wrote:
> > How much faster do you think it can be made?
> I don't know, irrelevant though.
> > The problem is not that individual steps are slow, but rather [...]

These are technical details; no user cares about them.
>
> The purpose of my suggestions was to /minimize/ the complexity that
> becomes part of Bash's codebase [...]
> I meant complexity of the language, not the codebase.
>
> In my opinion "make the existing extglob code faster" is a wasted effort
> if it doesn't get us to "run in at-most quadratic time" and preferably "run
> in regular (linear) time", and so that basically amounts to "write our own
> regex state machine compiler and regex engine". This is a non-trivial task,
> and would fairly obviously add
> > *more* complex code into Bash's codebase than any of my suggested
> > alternatives.
> extglobs are already a part of the bash language. All of your suggested
> alternatives involve expanding the language in question.


No. Just No.

That's why I
> disagree with all of them.
>






> > (Even my options of "postprocess the codebase" or "modify an existing
> > regex compiler" would leave their execution components untouched; only
> > the compilation phase would be modified, and a modified regex compiler
> > would at least stand a chance of existing as a stand-alone library
> > project.)
> If you mean bash should start shipping a huge library like pcre for
> solving an edge case, I don't think that's reasonable at all; why take
> on such a burden when you already have something that works fine in
> practice?
>

Using PCRE was only one option, and not my preferred one.
POSIX RE is provided by the standard C library, we wouldn't have to ship
anything.

The problem is that it DOESN'T work fine. In practice people encounter
abysmally slow extglob matching.


Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Greg Wooledge
On Mon, Oct 31, 2022 at 03:40:34AM +0200, Oğuz wrote:
> > Option B: Fix the line editor to take into account when the
> > prompt doesn't start at column 0.
> >
> >
> Yeah, or add a new prompt sequence (e.g. \N) that prints a newline only if
> the cursor is not at column 0.

There's no 100% portable way to determine where the cursor is.

Shells like zsh that show a special symbol in these cases use a hack
to do so.  There's a good explanation in this answer:

https://unix.stackexchange.com/questions/167582/why-zsh-ends-a-line-with-a-highlighted-percent-symbol#answer-302710



Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Oğuz
28 Ekim 2022 Cuma tarihinde Albert Vaca Cintora 
yazdı:
>
> Option A: If the previous command doesn't end in a newline,
> add a newline manually. This is what most shells do.


This sounds wrong. How are you going to know if the previous command ends
in a newline or not then?


> Option B: Fix the line editor to take into account when the
> prompt doesn't start at column 0.
>
>
Yeah, or add a new prompt sequence (e.g. \N) that prints a newline only if
the cursor is not at column 0.


-- 
Oğuz


Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Alex fxmbsw7 Ratchev
On Sun, Oct 30, 2022, 23:01 Dennis Williamson 
wrote:

>
>
> On Sun, Oct 30, 2022 at 4:41 PM Alex fxmbsw7 Ratchev 
> wrote:
>
>>
>>
>> i coded a files tree to bash code via gawk reading and printing bash code
>> i did noeol no newline at end
>> logically , cause , who wants var='from file\n'
>>
>> >
>>
>
> Because command substitution strips trailing newlines?
>

no , sir
i did simple files-in-dirs to bash-code
i cant ( i did ) generate bash via bash , but it was too slow
the current uses gawk to generate project bash code in one run with find \0
as input
for , to file , or eval , source ..

my point with the newlines i came across when processing the files 1:1 , i
think it began with mapfile usage but gawk does as well

there is to me , not knowing end newline(s) count , only a ( too ) fatal
disability
it also makes data processing , a pain

to me the ansi or whatever newline ending rule is , as i try to say ,
nothing profitable

what i mean in my example is
when normally writing a file in vim , it ends with the big time ignored
newline
say one has vars/foo with content 'bar'
my code would produce , along var stacking , foo='bar
'
to me , as data cruncher , one to one data preservance is must
so i changed vim settings to skip this ending newline

there is also another case if use for exactness
that is data stacking in sequencial files
sometimes \n sometimes not

hopes for good , /peace

$ echo -e 'foo\n\n\n'
> foo
>
>
>
> $ s=$(echo -e 'foo\n\n\n')
> $ declare -p s
> declare -- s="foo"
>
> No gyrations needed.
> --
> Visit serverfault.com to get your system administration questions
> answered.
>


Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Dennis Williamson
On Sun, Oct 30, 2022 at 4:41 PM Alex fxmbsw7 Ratchev 
wrote:

>
>
> i coded a files tree to bash code via gawk reading and printing bash code
> i did noeol no newline at end
> logically , cause , who wants var='from file\n'
>
> >
>

Because command substitution strips trailing newlines?

$ echo -e 'foo\n\n\n'
foo



$ s=$(echo -e 'foo\n\n\n')
$ declare -p s
declare -- s="foo"

No gyrations needed.
-- 
Visit serverfault.com to get your system administration questions answered.


Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Alex fxmbsw7 Ratchev
On Sun, Oct 30, 2022, 21:21 Albert Vaca Cintora 
wrote:

> On Sun, Oct 30, 2022 at 7:54 AM Martin D Kealey 
> wrote:
> >
> > This sounds like a bug in whatever is producing the output. POSIX text
> files have a newline terminating every line; that description includes
> streams going through pipes and tty devices if they purport to be text.
> >
>
> There are many reasons why one could end up with text in a terminal
> that doesn't end in a newline. A couple of them are:
> - An app is killed in the middle of writing its output (eg: because of
> a sigterm/sigkill).
> - A file that isn't a POSIX text file is printed to the terminal.
>
> So I think this should still be handled by bash.
>

i coded a files tree to bash code via gawk reading and printing bash code
i did noeol no newline at end
logically , cause , who wants var='from file\n'

>


Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Albert Vaca Cintora
On Sun, Oct 30, 2022 at 7:54 AM Martin D Kealey  wrote:
>
> This sounds like a bug in whatever is producing the output. POSIX text files 
> have a newline terminating every line; that description includes streams 
> going through pipes and tty devices if they purport to be text.
>

There are many reasons why one could end up with text in a terminal
that doesn't end in a newline. A couple of them are:
- An app is killed in the middle of writing its output (eg: because of
a sigterm/sigkill).
- A file that isn't a POSIX text file is printed to the terminal.

So I think this should still be handled by bash.



Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

2022-10-30 Thread Oğuz İsmail Uysal

On 10/30/22 3:25 PM, Martin D Kealey wrote:

How much faster do you think it can be made?

I don't know, irrelevant though.
The problem is not that individual steps are slow, but rather that it 
takes at least a higher-order-polynomial number of steps, possibly 
more (such as exponential or factorial).
Speeding up the individual steps will make no practical difference, 
while pin-hole optimisations may dramatically speed up some common 
cases, but still leave the most general cases catastrophically slow.

These are technical details; no user cares about them.
The purpose of my suggestions was to /minimize/ the complexity that 
becomes part of Bash's codebase, while leaving as few pathological 
cases as possible - preferably none.

I meant complexity of the language, not the codebase.
In my opinion "make the existing extglob code faster" is a wasted 
effort if it doesn't get us to "run in at-most quadratic time" and 
preferably "run in regular (linear) time", and so that basically 
amounts to "write our own regex state machine compiler and regex 
engine". This is a non-trivial task, and would fairly obviously add 
*more* complex code into Bash's codebase than any of my suggested 
alternatives.
extglobs are already a part of the bash language. All of your suggested 
alternatives involve expanding the language in question. That's why I 
disagree with all of them.
(Even my options of "postprocess the codebase" or "modify an existing 
regex compiler" would leave their execution components untouched; only 
the compilation phase would be modified, and a modified regex compiler 
would at least stand a chance of existing as a stand-alone library 
project.)
If you mean bash should start shipping a huge library like pcre for 
solving an edge case, I don't think that's reasonable at all; why take 
on such a burden when you already have something that works fine in 
practice?




Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

2022-10-30 Thread Alex fxmbsw7 Ratchev
On Sun, Oct 30, 2022, 11:33 Oğuz  wrote:

> 30 Ekim 2022 Pazar tarihinde Martin D Kealey 
> yazdı:
> >
> > So the options would seem to be:
> > (a) prohibit inversions (you get to pick EITHER extglob or rexglob, not
> > both);
> > (b) bypass convert-to-regex when inversions are present;
> > (c) use PCRE or Vim RE, which already support negations (though not in
> the
> > same form); note that these do not have linear ("regular") time
> performance
> > in any but the trivial cases;
> > (d) compute the "inverse" regex for a given negation (this may require
> Vim
> > RE, see below);
> > (e) post-process the compiled regex (this would be highly dependent on
> the
> > specific RE implementation);
> > (f) pick an existing regex engine, and add the necessary logic to handle
> > negations and conjunctions (see below);
> >
>
> Or
> (g) make the existing extglob code faster and avoid introducing more
> complexity into the shell.
>
> Which, in my opinion, is the easiest and most realistic option.
>

last three times i bugged about glob ( like reuse groupings ) it sounded
'do-it-yourself' back only
i would , have done , if i could ( .c )

-- 
> Oğuz
>


Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

2022-10-30 Thread Oğuz
30 Ekim 2022 Pazar tarihinde Martin D Kealey 
yazdı:
>
> So the options would seem to be:
> (a) prohibit inversions (you get to pick EITHER extglob or rexglob, not
> both);
> (b) bypass convert-to-regex when inversions are present;
> (c) use PCRE or Vim RE, which already support negations (though not in the
> same form); note that these do not have linear ("regular") time performance
> in any but the trivial cases;
> (d) compute the "inverse" regex for a given negation (this may require Vim
> RE, see below);
> (e) post-process the compiled regex (this would be highly dependent on the
> specific RE implementation);
> (f) pick an existing regex engine, and add the necessary logic to handle
> negations and conjunctions (see below);
>

Or
(g) make the existing extglob code faster and avoid introducing more
complexity into the shell.

Which, in my opinion, is the easiest and most realistic option.


-- 
Oğuz


Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

2022-10-30 Thread Martin D Kealey
On Sat, 29 Oct 2022 at 22:15, Greg Wooledge  wrote:

> On Sat, Oct 29, 2022 at 04:50:00PM +1100, Martin D Kealey wrote:
> > This seems like a good reason to simply translate extglobs into regexes,
> > which should run in linear time, rather than put effort into building and
> > debugging a parallel implementation.
>
> This isn't straightforward, because of the !(list) feature of extglob.
> There's no analogous construct for that in standard regexes.
>

That's true. and my "simply" is understating the complexity of the task,
but it's not impossible.

There was a recent discussion about this very point on
irc://libera.chat#Bash where it was pointed out that supporting the !(list)
style of inversions in the regex compiler is simply a matter of inverting
the "success" or "failure" indications attached to the states when
compiling a regex, which means that it works correctly when these
constructs are nested.

So the options would seem to be:
(a) prohibit inversions (you get to pick EITHER extglob or rexglob, not
both);
(b) bypass convert-to-regex when inversions are present;
(c) use PCRE or Vim RE, which already support negations (though not in the
same form); note that these do not have linear ("regular") time performance
in any but the trivial cases;
(d) compute the "inverse" regex for a given negation (this may require Vim
RE, see below);
(e) post-process the compiled regex (this would be highly dependent on the
specific RE implementation);
(f) pick an existing regex engine, and add the necessary logic to handle
negations and conjunctions (see below);

A particular difficulty is handling !(!(X)|!(Y)) which by de Morgan's law
translates into @(X) - meaning a target needs to match BOTH X and Y. Only
Vim's Regex engine has direct support for that construct, and it exhibits
non-linear timing when using it.

A naïve implementation could easily require state table space that's
exponential in the breadth of the conjunctions, meaning that the user could
very easily write a rexglob that cannot be compiled with available memory.
So there needs to be a fallback position: either fail, telling the user we
cannot honour the linear speed guarantee, or "try anyway" with a procedure
that could exhibit truly horrible time complexity.

-Martin

PS: I say "we" and "us" advisedly; let's not leave everything to Chet.


Re: Multiline editing breaks if the previous output doesn't end in newline

2022-10-30 Thread Martin D Kealey
This sounds like a bug in whatever is producing the output. POSIX text
files have a newline terminating every line; that description includes
streams going through pipes and tty devices if they purport to be text.

It's fairly simple to fix this by adding this to the end of your .bashrc
(or the system /etc/bashrc):

 PS1='\[\e[1;7;31m\e[K<<\e[$((COLUMNS-8))C<<\r\e[m\e[2K\]'"$PS1"

or you might prefer this version:

 _MissingEOL='<<< missing EOL ' # your choice of text here
 for (( t = COLUMNS<250 ? 1000 : COLUMNS*4; ${#_MissingEOL} < t ;)) do
_MissingEOL=$_MissingEOL$_MissingEOL ; done
 PS1='\[\e[1;7;31m${_MissingEOL:0:COLUMNS}\r\e[m\e[2K\]'"$PS1"

These will mark an incomplete line with a red chevron to highlight the
erroneous output, and move the cursor to the correct position on the next
line. If this makes copying and pasting just slightly awkward, that would
prompt me to fix the broken program, or to remember to add «; echo» when
running it.

(You might want to make this contingent on TERM being a known
ANSI-compatible terminal type, or verifying the output of tput.)

Sadly, there is an increasing number of common new utilities that don't
honour the POSIX requirements for a text stream, such as 'jq' in its
"brief" mode; if you come across these, please file bug reports on them. If
the maintainers push back, please point them at the POSIX standard.

-Martin


Re: Subsequent Here Doc/String Never Evaluated in Process Substitution

2022-10-30 Thread Martin D Kealey
On Fri, 28 Oct 2022 at 20:37,  wrote:

> Thank you for the awesome shell.  I noticed the following after upgrading
> from 5.1.16-3 to 5.2.2-2 on Fedora.  It actually resulted in a minor
> amount
> of data loss.


After fixing the attached file to remove the carriage returns, I was able
to reproduce the fault using Bash v5.2.0 rc2

Running with « bash-5.2.0-r2 -x ConsecutiveHereDocStringBug », it appears
that the "cat" on the line following the assignment to uS is subsumed into
it, and then the redirection on the assignment is ignored.

(I discovered this when I attempted to insert « printf HERE\\n >&2 »
between the assignment and the cat, only to then see HERE\n: command not
found.)

Indeed, it appears that the command following the "do" always absorbs the
first word from the second command, despite the line break between them.

This does seem like a fairly serious issue if it's present in the 5.2.0
release.

-Martin