subject:"\[Toybox\] pathological case in sed s\/\/\/g"

Re: [Toybox] pathological case in sed s///g

2019-05-08 Thread Rob Landley

I emailed Michael Kerrisk to update the man page to mention it...

Rob

On 5/8/19 7:52 PM, enh wrote:
> (i checked and macOS does have both REG_STARTEND and REG_PEND.)
> 
> From: Rob Landley 
> Date: Mon, May 6, 2019 at 10:42 AM
> To: enh
> Cc: toybox, Rich Felker
> 
>> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at 
>> it.
>> It's not in the regex man page but it _is_ in the glibc headers...
>>
>>   https://github.com/bminor/glibc/commit/6fefb4e0b16
>>
>> Looks like it went into glibc in 2004, which is way past 7 years. I should 
>> poke
>> Michael Kerrisk to update the man page.
>>
>> And musl explicitly refused to do it, but Rich makes bad calls all the time:
>>
>>   https://www.openwall.com/lists/musl/2013/01/15/26
>>
>> I still haven't got an #ifdef __MUSL__ in portability because Rich insists 
>> his
>> libc is the only perfect piece of software ever written, but I can probe for 
>> it
>> at compile time and #define 0 to stub out. (Unlike needing a manual config
>> symbol to work around his nommu insanity where he went out of his WAY to 
>> break
>> compile time probes, or needing to wrap syscalls myself in
>> https://github.com/landley/toybox/commit/833fb23fe8b4  because he decided he
>> didn't like a set of syscalls and _removed_ support musl had working fine at 
>> one
>> point...)
>>
>> If this _does_ match NUL bytes it lets me remove regexec0() entirely from 
>> libc,
>> which would be very nice. And since it started life as a BSD extension 
>> (they're
>> the only man page _documenting_ it) I get support on FreeBSD and probably 
>> MacOS too.
>>
>> Cool, thanks,
>>
>> Rob
>>
>> On 5/6/19 10:56 AM, enh wrote:
>>> i think you might be able to use REG_STARTEND to avoid the implicit
>>> strlen: 
>>> https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports
>>>
>>> REG_STARTEND
>>>The string is considered to start at string +
>>>pmatch[0].rm_so and to end before the byte located at
>>>string + pmatch[0].rm_eo, regardless of the value of
>>>nmatch.  See below for the definition of pmatch and nmatch.
>>>This is an extension, compatible with but not specified by
>>>IEEE Std 1003.2 (``POSIX.2''), and should be used with cau-
>>>tion in software intended to be portable to other systems.
>>>
>>>Without REG_NOTBOL, the position rm_so is considered the
>>>beginning of a line, such that `^' matches before it, and
>>>the beginning of a word if there is a word character at
>>>this position, such that `[[:<:]]' and `\<' match before
>>>it.
>>>
>>>With REG_NOTBOL, the character at position rm_so is treated
>>>as the continuation of a line, and if rm_so is greater than
>>>0, the preceding character is taken into consideration.  If
>>>the preceding character is a newline and the regular
>>>expression was compiled with REG_NEWLINE, `^' matches
>>>before the string; if the preceding character is not a word
>>>character but the string starts with a word character,
>>>`[[:<:]]' and `\<' match before the string.
>>>
>>> From: Rob Landley 
>>> Date: Sun, May 5, 2019 at 9:41 AM
>>> To: enh
>>> Cc: toybox
>>>
 On 5/3/19 2:42 PM, enh wrote:
> On Fri, May 3, 2019 at 11:59 AM Rob Landley  wrote:
>>
>> On 5/3/19 1:56 PM, Rob Landley wrote:
>>> On 5/3/19 1:05 PM, enh wrote:
>>> But yeah, the new pessimal case after the change I'm making now would 
>>> be a
>>> megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output 
>>> buffer
>>> thing...
>>
>> And it can be avoided by an in-place copy that remembers "last gap" and 
>> doesn't
>> copy trailing data until the _next_ hit (or we confirm there's no hit) 
>> so we
>> know how much of the skipped data is "confirmed" and only copy it once 
>> we know
>> we're keeping it.
>
> yeah, that's what i meant by only ensuring that the first n characters
> are right.
>
>> So every kept byte is only copied once, and it can still be done in 
>> place for
>> the shrinking case. (You'll have to realloc when you expand, but that's 
>> probably
>> unavoidable...)

 Which I just did (not even trying in-place, everything gets copied once to 
 a new
 string), and it's still way too slow.

 Time to dig out https://jvns.ca/perf-zine.pdf and see where it's spending 
 its
 time... Sigh. I forgot quite how much I hate perf, but:

   68.60%  sed  sed[.] regexec0
   29.93%  sed  [kernel.kallsyms]  [k] nmi
1.47%  sed  libc-2.24.so   [.] strlen
0.00%  sed  [kernel.kallsyms]  [k] __acct_update_integrals

 It's regexec0(). H... ah, my NUL wrapper is essentially calling 
 strlen() on
 the string before doing the regex each time (to figure out where the null 
 to
 skip to next is), that should go _after_ it tries to match this segment...

 Ok,

Re: [Toybox] pathological case in sed s///g

2019-05-08 Thread enh via Toybox

(i checked and macOS does have both REG_STARTEND and REG_PEND.)

From: Rob Landley 
Date: Mon, May 6, 2019 at 10:42 AM
To: enh
Cc: toybox, Rich Felker

> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at 
> it.
> It's not in the regex man page but it _is_ in the glibc headers...
>
>   https://github.com/bminor/glibc/commit/6fefb4e0b16
>
> Looks like it went into glibc in 2004, which is way past 7 years. I should 
> poke
> Michael Kerrisk to update the man page.
>
> And musl explicitly refused to do it, but Rich makes bad calls all the time:
>
>   https://www.openwall.com/lists/musl/2013/01/15/26
>
> I still haven't got an #ifdef __MUSL__ in portability because Rich insists his
> libc is the only perfect piece of software ever written, but I can probe for 
> it
> at compile time and #define 0 to stub out. (Unlike needing a manual config
> symbol to work around his nommu insanity where he went out of his WAY to break
> compile time probes, or needing to wrap syscalls myself in
> https://github.com/landley/toybox/commit/833fb23fe8b4  because he decided he
> didn't like a set of syscalls and _removed_ support musl had working fine at 
> one
> point...)
>
> If this _does_ match NUL bytes it lets me remove regexec0() entirely from 
> libc,
> which would be very nice. And since it started life as a BSD extension 
> (they're
> the only man page _documenting_ it) I get support on FreeBSD and probably 
> MacOS too.
>
> Cool, thanks,
>
> Rob
>
> On 5/6/19 10:56 AM, enh wrote:
> > i think you might be able to use REG_STARTEND to avoid the implicit
> > strlen: 
> > https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports
> >
> > REG_STARTEND
> >The string is considered to start at string +
> >pmatch[0].rm_so and to end before the byte located at
> >string + pmatch[0].rm_eo, regardless of the value of
> >nmatch.  See below for the definition of pmatch and nmatch.
> >This is an extension, compatible with but not specified by
> >IEEE Std 1003.2 (``POSIX.2''), and should be used with cau-
> >tion in software intended to be portable to other systems.
> >
> >Without REG_NOTBOL, the position rm_so is considered the
> >beginning of a line, such that `^' matches before it, and
> >the beginning of a word if there is a word character at
> >this position, such that `[[:<:]]' and `\<' match before
> >it.
> >
> >With REG_NOTBOL, the character at position rm_so is treated
> >as the continuation of a line, and if rm_so is greater than
> >0, the preceding character is taken into consideration.  If
> >the preceding character is a newline and the regular
> >expression was compiled with REG_NEWLINE, `^' matches
> >before the string; if the preceding character is not a word
> >character but the string starts with a word character,
> >`[[:<:]]' and `\<' match before the string.
> >
> > From: Rob Landley 
> > Date: Sun, May 5, 2019 at 9:41 AM
> > To: enh
> > Cc: toybox
> >
> >> On 5/3/19 2:42 PM, enh wrote:
> >>> On Fri, May 3, 2019 at 11:59 AM Rob Landley  wrote:
> 
>  On 5/3/19 1:56 PM, Rob Landley wrote:
> > On 5/3/19 1:05 PM, enh wrote:
> > But yeah, the new pessimal case after the change I'm making now would 
> > be a
> > megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output 
> > buffer
> > thing...
> 
>  And it can be avoided by an in-place copy that remembers "last gap" and 
>  doesn't
>  copy trailing data until the _next_ hit (or we confirm there's no hit) 
>  so we
>  know how much of the skipped data is "confirmed" and only copy it once 
>  we know
>  we're keeping it.
> >>>
> >>> yeah, that's what i meant by only ensuring that the first n characters
> >>> are right.
> >>>
>  So every kept byte is only copied once, and it can still be done in 
>  place for
>  the shrinking case. (You'll have to realloc when you expand, but that's 
>  probably
>  unavoidable...)
> >>
> >> Which I just did (not even trying in-place, everything gets copied once to 
> >> a new
> >> string), and it's still way too slow.
> >>
> >> Time to dig out https://jvns.ca/perf-zine.pdf and see where it's spending 
> >> its
> >> time... Sigh. I forgot quite how much I hate perf, but:
> >>
> >>   68.60%  sed  sed[.] regexec0
> >>   29.93%  sed  [kernel.kallsyms]  [k] nmi
> >>1.47%  sed  libc-2.24.so   [.] strlen
> >>0.00%  sed  [kernel.kallsyms]  [k] __acct_update_integrals
> >>
> >> It's regexec0(). H... ah, my NUL wrapper is essentially calling 
> >> strlen() on
> >> the string before doing the regex each time (to figure out where the null 
> >> to
> >> skip to next is), that should go _after_ it tries to match this segment...
> >>
> >> Ok, and now that I've fixed that it's _still_ taking forever and strlen() 
> >> is
> >> dominating, but regexec0() is basically calling regexec() as the first 
> >>

Re: [Toybox] pathological case in sed s///g

2019-05-07 Thread Rich Felker

On Mon, May 06, 2019 at 10:57:23PM -0700, enh wrote:
> > I know not everything is a macro that can be tested; both myself and
> > folks from other implementations are interested in developing an
> > agreed-upon way to report availability of other extensions via macros,
> > so that configure-style compile/link tests are not needed. This would
> > also facilitate reporting of some things that can't even properly be
> > tested because they would need runtime tests. Unfortunately efforts in
> > this area are stalled from lack of interest. If you or others would
> > like to raise a fuss about making it happen, I'd welcome it.
> 
> wouldn't it be better to have more stuff along the lines of clang's
> __has_builtin/__has_feature/__has_include? they're already very useful
> for getting rid of this kind of stuff, and a __has_function would
> cover most of what's missing.

It probably doesn't, and it's hard to figure out who would be
responsible for providing them. Are you suggesting the compiler would
do it automatically based on presence of a declaration? There's
already an evil hack to check for presence of a declaration and branch
on the result using _Generic, but of course that's after the
preprocessor level. OTOH the preprocessor doesn't and can't have
access to the presence or absence of declarations since everything is
just tokens at that stage.

I think it would be a lot simpler, more flexible, and compatible with
existing compilers (probably a requirement on our side) to simply have
the library define macros for the individual extensions or extension
groups it provides -- basically the same thing as the existing POSIX
macros in unistd.h, but for extensions outside of POSIX. One major
advantage here is that we can also report extensions that are not in
the form of symbols or flags, but some other type of interface -- for
example, support for %m in the printf family.

As a side benefit, this would probably require us to look at,
enumerate, and better document the extensions we support. :-)

Rich
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-07 Thread Rich Felker

On Tue, May 07, 2019 at 05:27:24PM +0300, makep...@firemail.cc wrote:
> 
> > wouldn't it be better to have more stuff along the lines of clang's
> > __has_builtin/__has_feature/__has_include? they're already very useful
> > for getting rid of this kind of stuff, and a __has_function would
> > cover most of what's missing.
> 
> Feature claims of clang and musl are giving us a conflict when
> building cmake btw, could anyone look at
> https://github.com/gentoo/musl/issues/141 please? I don't know
> whether to forward this to clang or musl or cmake or some other
> list, thus glad you brought up the topic.

Just replied on the tracker:

  "musl uses a BSD style header layout, where the system (libc) header
  location must come before the compiler's header location in the
  search path; it does not use and is not compatible with the
  compiler-provided versions of the standard headers, which should
  never get included."

Thanks for the heads-up.

Rich
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-07 Thread makepost



> wouldn't it be better to have more stuff along the lines of clang's
> __has_builtin/__has_feature/__has_include? they're already very useful
> for getting rid of this kind of stuff, and a __has_function would
> cover most of what's missing.

Feature claims of clang and musl are giving us a conflict when building cmake 
btw, could anyone look at https://github.com/gentoo/musl/issues/141 please? I 
don't know whether to forward this to clang or musl or cmake or some other 
list, thus glad you brought up the topic.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-06 Thread enh via Toybox

From: Rich Felker 
Date: Mon, May 6, 2019 at 7:40 PM
To: Rob Landley
Cc: toybox

> On Mon, May 06, 2019 at 07:05:46PM -0500, Rob Landley wrote:
> > On 5/6/19 12:48 PM, Rich Felker wrote:
> > > On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
> > >> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me 
> > >> at it.
> > >> It's not in the regex man page but it _is_ in the glibc headers...
> > >>
> > >>   https://github.com/bminor/glibc/commit/6fefb4e0b16
> > >>
> > >> Looks like it went into glibc in 2004, which is way past 7 years. I 
> > >> should poke
> > >> Michael Kerrisk to update the man page.
> > >>
> > >> And musl explicitly refused to do it, but Rich makes bad calls all the 
> > >> time:
> > >>
> > >>   https://www.openwall.com/lists/musl/2013/01/15/26
> > >>
> > >> I still haven't got an #ifdef __MUSL__ in portability because Rich 
> > >> insists his
> > >> libc is the only perfect piece of software ever written, but I can probe 
> > >> for it
> > >
> > > You can #ifdef REG_STARTEND and use it conditionally depending on
> > > whether the functionality is offered. There is no reason to hard-code
> > > an assumption that musl doesn't or does have this functionality;
> >
> > I did (https://github.com/landley/toybox/commit/48162c4ee3fb) but it's 
> > still a
> > musl-specific workaround. Glibc, uClibc, and bionic all adopted this about 
> > 15
> > years ago, and it came from freebsd in the first place which presumably 
> > covers
> > macosx (I can't currently check, but somebody would poke me before too 
> > long).
>
> Regardless of your opinion on whether or why musl does or doesn't have
> this extension, testing for it with #ifdef REG_STARTEND rather than
> "testing for musl" is the right approach, because if/when it's added
> to musl, you automatically get support for it. If you hard-coded an
> assumption that musl doesn't have it, then users are stuck with that
> even if it changes.
>
> > Musl is the only thing left that _doesn't_ have this functonality, and 
> > that's
> > because
>   ~~~
>
> I don't think it's particularly "because" of this. Rather, the pros
> and cons of adding this extension have been under discussion for a
> while, with nobody pushing for further action on the topic.
>
> > you dismissed NUL matching as not a "meaningful use of sed", and called
> > it "hideous hacks" and "nobody will notice the difference with it missing". 
> > (I
> > implemented a regexec0() wrapper to provide NUL matching 3 years before this
> > performance issue came up.)
>
> Use on non-text data is outside the scope of the standard sed utility,
> and I'm a bit hostile to insistence that the standard utilities be
> useful with non-text because of what a disaster Austin Group issue 663
> turned into. However this is really not a consideration in whether
> REG_STARTEND is included going forward.
>
> > > it's
> > > been proposed and there's even a patch somewhere. (It's not costly to
> > > support relative to the current bad regex implementation, but the
> > > concern is that it might impose a nontrivial cost on a future good
> > > implementation once we're locked into having it.)
> >
> > We have different project philosophies. I wait for actual users to show up
> > before implementing a lot of things, but when they do I don't tend to think 
> > I
> > know better than they do about what their use cases should be.
>
> If I accepted arbitrary requests for stuff in musl, it would just be a
> clone of glibc. As a community we have developed pretty clear and
> consistent criteria for what does or doesn't get included. I'm not
> aware of any indication that REG_STARTEND shouldn't be included except
> possible future runtime cost; it doesn't have any major complexity
> cost or incompatibility between different historical implementations
> AFAIK.
>
> > $ echo -e 'hello\0blah' | ./sed 's/blah/boing/' | hd
> >   68 65 6c 6c 6f 00 62 6f  69 6e 67 0a  |hello.boing.|
> > 000c
> >
> > The freebsd man page from last time
> > (https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports)
> > says that REG_PEND is explicitly for specifying the length of a string
> > containing NULs (without moving the starting point like REG_STARTEND can), 
> > so
> > the plumbing in between already has to support embedded NULs. The man page 
> > could
> > be clearer, hopefully Michael Kerrisk's will be.
> >
> > > whether it treats the start/end as *additional* constraints,
> >
> > Not in the systems I've been able to test. (I need to get my freebsd test 
> > image
> > set back up on the new machine...)
> >
> > (And hey, congrats on musl not doing the strlen() before matching without 
> > this.
> > You don't have the N^2 performance issue glibc and bionic did, and thus 
> > don't
> > need that reason this patch went in. But I didn't keep the previous 
> > "implement
> > match after NULL" code that worked without it just for musl because I don't 
> > care
> > enough about

Re: [Toybox] pathological case in sed s///g

2019-05-06 Thread Rich Felker

On Mon, May 06, 2019 at 07:05:46PM -0500, Rob Landley wrote:
> On 5/6/19 12:48 PM, Rich Felker wrote:
> > On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
> >> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me 
> >> at it.
> >> It's not in the regex man page but it _is_ in the glibc headers...
> >>
> >>   https://github.com/bminor/glibc/commit/6fefb4e0b16
> >>
> >> Looks like it went into glibc in 2004, which is way past 7 years. I should 
> >> poke
> >> Michael Kerrisk to update the man page.
> >>
> >> And musl explicitly refused to do it, but Rich makes bad calls all the 
> >> time:
> >>
> >>   https://www.openwall.com/lists/musl/2013/01/15/26
> >>
> >> I still haven't got an #ifdef __MUSL__ in portability because Rich insists 
> >> his
> >> libc is the only perfect piece of software ever written, but I can probe 
> >> for it
> > 
> > You can #ifdef REG_STARTEND and use it conditionally depending on
> > whether the functionality is offered. There is no reason to hard-code
> > an assumption that musl doesn't or does have this functionality;
> 
> I did (https://github.com/landley/toybox/commit/48162c4ee3fb) but it's still a
> musl-specific workaround. Glibc, uClibc, and bionic all adopted this about 15
> years ago, and it came from freebsd in the first place which presumably covers
> macosx (I can't currently check, but somebody would poke me before too long).

Regardless of your opinion on whether or why musl does or doesn't have
this extension, testing for it with #ifdef REG_STARTEND rather than
"testing for musl" is the right approach, because if/when it's added
to musl, you automatically get support for it. If you hard-coded an
assumption that musl doesn't have it, then users are stuck with that
even if it changes.

> Musl is the only thing left that _doesn't_ have this functonality, and that's
> because
  ~~~

I don't think it's particularly "because" of this. Rather, the pros
and cons of adding this extension have been under discussion for a
while, with nobody pushing for further action on the topic.

> you dismissed NUL matching as not a "meaningful use of sed", and called
> it "hideous hacks" and "nobody will notice the difference with it missing". (I
> implemented a regexec0() wrapper to provide NUL matching 3 years before this
> performance issue came up.)

Use on non-text data is outside the scope of the standard sed utility,
and I'm a bit hostile to insistence that the standard utilities be
useful with non-text because of what a disaster Austin Group issue 663
turned into. However this is really not a consideration in whether
REG_STARTEND is included going forward.

> > it's
> > been proposed and there's even a patch somewhere. (It's not costly to
> > support relative to the current bad regex implementation, but the
> > concern is that it might impose a nontrivial cost on a future good
> > implementation once we're locked into having it.)
> 
> We have different project philosophies. I wait for actual users to show up
> before implementing a lot of things, but when they do I don't tend to think I
> know better than they do about what their use cases should be.

If I accepted arbitrary requests for stuff in musl, it would just be a
clone of glibc. As a community we have developed pretty clear and
consistent criteria for what does or doesn't get included. I'm not
aware of any indication that REG_STARTEND shouldn't be included except
possible future runtime cost; it doesn't have any major complexity
cost or incompatibility between different historical implementations
AFAIK.

> $ echo -e 'hello\0blah' | ./sed 's/blah/boing/' | hd
>   68 65 6c 6c 6f 00 62 6f  69 6e 67 0a  |hello.boing.|
> 000c
> 
> The freebsd man page from last time
> (https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports)
> says that REG_PEND is explicitly for specifying the length of a string
> containing NULs (without moving the starting point like REG_STARTEND can), so
> the plumbing in between already has to support embedded NULs. The man page 
> could
> be clearer, hopefully Michael Kerrisk's will be.
> 
> > whether it treats the start/end as *additional* constraints,
> 
> Not in the systems I've been able to test. (I need to get my freebsd test 
> image
> set back up on the new machine...)
> 
> (And hey, congrats on musl not doing the strlen() before matching without 
> this.
> You don't have the N^2 performance issue glibc and bionic did, and thus don't
> need that reason this patch went in. But I didn't keep the previous "implement
> match after NULL" code that worked without it just for musl because I don't 
> care
> enough about working around musl's unique and intentional limitations anymore.
> It can stay broken for all I care. I'm not quite to the point of removing
> existing fixes like the sched_get_priority() syscall wrapping and the nommu
> config symbol, but only because bionic doesn't support nommu yet, and I'm not
> _quite_

Re: [Toybox] pathological case in sed s///g

2019-05-06 Thread Rob Landley

On 5/6/19 12:48 PM, Rich Felker wrote:
> On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
>> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at 
>> it.
>> It's not in the regex man page but it _is_ in the glibc headers...
>>
>>   https://github.com/bminor/glibc/commit/6fefb4e0b16
>>
>> Looks like it went into glibc in 2004, which is way past 7 years. I should 
>> poke
>> Michael Kerrisk to update the man page.
>>
>> And musl explicitly refused to do it, but Rich makes bad calls all the time:
>>
>>   https://www.openwall.com/lists/musl/2013/01/15/26
>>
>> I still haven't got an #ifdef __MUSL__ in portability because Rich insists 
>> his
>> libc is the only perfect piece of software ever written, but I can probe for 
>> it
> 
> You can #ifdef REG_STARTEND and use it conditionally depending on
> whether the functionality is offered. There is no reason to hard-code
> an assumption that musl doesn't or does have this functionality;

I did (https://github.com/landley/toybox/commit/48162c4ee3fb) but it's still a
musl-specific workaround. Glibc, uClibc, and bionic all adopted this about 15
years ago, and it came from freebsd in the first place which presumably covers
macosx (I can't currently check, but somebody would poke me before too long).

Musl is the only thing left that _doesn't_ have this functonality, and that's
because you dismissed NUL matching as not a "meaningful use of sed", and called
it "hideous hacks" and "nobody will notice the difference with it missing". (I
implemented a regexec0() wrapper to provide NUL matching 3 years before this
performance issue came up.)

> it's
> been proposed and there's even a patch somewhere. (It's not costly to
> support relative to the current bad regex implementation, but the
> concern is that it might impose a nontrivial cost on a future good
> implementation once we're locked into having it.)

We have different project philosophies. I wait for actual users to show up
before implementing a lot of things, but when they do I don't tend to think I
know better than they do about what their use cases should be.

I'll push back a bit or negotiate to make sure they're serious and this is
important to them and that I understand what they actually need. Sometimes I ask
a second user to speak up to make sure the first isn't an outlier. and I'll
often offer alternatives before doing what they asked to see if an easier thing
for me to do will meet their needs (and then wait for somebody else to show up
needing the full thing)...

But I implement what the users need my software to _do_ because that's what this
software is _for_. I do NOT tell my users that they're wrong because I disagree
with what they're trying to run. I grumble about it a lot while doing it, but I
don't pretend I know what my users should be doing better than they do. My
bright line boundary is "this command is out of scope, it should not be in
toybox at all, use a different implementation". Because if they say "this
command you have needs to do this" and mine can't do it, using a different
implementation is what they'll do anyway. When it comes to the privilege of
being the first command in the $PATH with that name, there can be only one.

(There is no _way_ I wouldn't have ripped out tar --to-command during the
cleanup otherwise. But it's part of somebody's use case, existing scripts, oh
well. Lots of extra work to get it to work with nommu, but I got there
eventually...)

I've told you a bunch of times that you're wrong about nommu support. The
founder of uclinux has told you a bunch of times you're wrong about nommu
support. But you insist that you know better than your users, don't even offer a
--nommu-probeable build time option, and that everyone else must rewrite their
software to be worthy to use your library.

I cc'd you on this issue because it seemed impolite to talk about it behind your
back, but I stopped trying to argue with you about this sort of thing after
https://github.com/landley/toybox/commit/833fb23fe8b4 because there's no point.

>> If this _does_ match NUL bytes it lets me remove regexec0() entirely from 
>> libc,
>> which would be very nice. And since it started life as a BSD extension 
>> (they're
>> the only man page _documenting_ it) I get support on FreeBSD and probably 
>> MacOS too.
> 
> I'm not sure if the proposed patch supports matching NUL or not (i.e.

It's not proposed, it's committed, and it worked when I tested it with glibc in
devuan's toolchain:

$ echo -e "hello\0there" | toybox sed 's/there/blah/'
helloblah

And with bionic in the Android NDK:

$ echo -e 'hello\0blah' | ./sed 's/blah/boing/' | hd
  68 65 6c 6c 6f 00 62 6f  69 6e 67 0a  |hello.boing.|
000c

The freebsd man page from last time
(https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports)
says that REG_PEND is explicitly for specifying the length of a string
containing NULs (without moving the starting point like REG_STARTEND

Re: [Toybox] pathological case in sed s///g

2019-05-06 Thread Rich Felker

On Mon, May 06, 2019 at 12:42:44PM -0500, Rob Landley wrote:
> Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at 
> it.
> It's not in the regex man page but it _is_ in the glibc headers...
> 
>   https://github.com/bminor/glibc/commit/6fefb4e0b16
> 
> Looks like it went into glibc in 2004, which is way past 7 years. I should 
> poke
> Michael Kerrisk to update the man page.
> 
> And musl explicitly refused to do it, but Rich makes bad calls all the time:
> 
>   https://www.openwall.com/lists/musl/2013/01/15/26
> 
> I still haven't got an #ifdef __MUSL__ in portability because Rich insists his
> libc is the only perfect piece of software ever written, but I can probe for 
> it

You can #ifdef REG_STARTEND and use it conditionally depending on
whether the functionality is offered. There is no reason to hard-code
an assumption that musl doesn't or does have this functionality; it's
been proposed and there's even a patch somewhere. (It's not costly to
support relative to the current bad regex implementation, but the
concern is that it might impose a nontrivial cost on a future good
implementation once we're locked into having it.)

> If this _does_ match NUL bytes it lets me remove regexec0() entirely from 
> libc,
> which would be very nice. And since it started life as a BSD extension 
> (they're
> the only man page _documenting_ it) I get support on FreeBSD and probably 
> MacOS too.

I'm not sure if the proposed patch supports matching NUL or not (i.e.
whether it treats the start/end as *additional* constraints, to work
with a substring of a necessarily null-terminated string, or whether
they replace the null-termination criterion. If we do adopt it we
should ensure we do whatever other implementations do here, especially
if that's important to use cases you or others want.

Rich
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-06 Thread Rob Landley

Huh... I'll assume REG_STARTEND works in bionic since you're pointing me at it.
It's not in the regex man page but it _is_ in the glibc headers...

  https://github.com/bminor/glibc/commit/6fefb4e0b16

Looks like it went into glibc in 2004, which is way past 7 years. I should poke
Michael Kerrisk to update the man page.

And musl explicitly refused to do it, but Rich makes bad calls all the time:

  https://www.openwall.com/lists/musl/2013/01/15/26

I still haven't got an #ifdef __MUSL__ in portability because Rich insists his
libc is the only perfect piece of software ever written, but I can probe for it
at compile time and #define 0 to stub out. (Unlike needing a manual config
symbol to work around his nommu insanity where he went out of his WAY to break
compile time probes, or needing to wrap syscalls myself in
https://github.com/landley/toybox/commit/833fb23fe8b4  because he decided he
didn't like a set of syscalls and _removed_ support musl had working fine at one
point...)

If this _does_ match NUL bytes it lets me remove regexec0() entirely from libc,
which would be very nice. And since it started life as a BSD extension (they're
the only man page _documenting_ it) I get support on FreeBSD and probably MacOS 
too.

Cool, thanks,

Rob

On 5/6/19 10:56 AM, enh wrote:
> i think you might be able to use REG_STARTEND to avoid the implicit
> strlen: 
> https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports
> 
> REG_STARTEND
>The string is considered to start at string +
>pmatch[0].rm_so and to end before the byte located at
>string + pmatch[0].rm_eo, regardless of the value of
>nmatch.  See below for the definition of pmatch and nmatch.
>This is an extension, compatible with but not specified by
>IEEE Std 1003.2 (``POSIX.2''), and should be used with cau-
>tion in software intended to be portable to other systems.
>
>Without REG_NOTBOL, the position rm_so is considered the
>beginning of a line, such that `^' matches before it, and
>the beginning of a word if there is a word character at
>this position, such that `[[:<:]]' and `\<' match before
>it.
> 
>With REG_NOTBOL, the character at position rm_so is treated
>as the continuation of a line, and if rm_so is greater than
>0, the preceding character is taken into consideration.  If
>the preceding character is a newline and the regular
>expression was compiled with REG_NEWLINE, `^' matches
>before the string; if the preceding character is not a word
>character but the string starts with a word character,
>`[[:<:]]' and `\<' match before the string.
> 
> From: Rob Landley 
> Date: Sun, May 5, 2019 at 9:41 AM
> To: enh
> Cc: toybox
> 
>> On 5/3/19 2:42 PM, enh wrote:
>>> On Fri, May 3, 2019 at 11:59 AM Rob Landley  wrote:

 On 5/3/19 1:56 PM, Rob Landley wrote:
> On 5/3/19 1:05 PM, enh wrote:
> But yeah, the new pessimal case after the change I'm making now would be a
> megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output 
> buffer
> thing...

 And it can be avoided by an in-place copy that remembers "last gap" and 
 doesn't
 copy trailing data until the _next_ hit (or we confirm there's no hit) so 
 we
 know how much of the skipped data is "confirmed" and only copy it once we 
 know
 we're keeping it.
>>>
>>> yeah, that's what i meant by only ensuring that the first n characters
>>> are right.
>>>
 So every kept byte is only copied once, and it can still be done in place 
 for
 the shrinking case. (You'll have to realloc when you expand, but that's 
 probably
 unavoidable...)
>>
>> Which I just did (not even trying in-place, everything gets copied once to a 
>> new
>> string), and it's still way too slow.
>>
>> Time to dig out https://jvns.ca/perf-zine.pdf and see where it's spending its
>> time... Sigh. I forgot quite how much I hate perf, but:
>>
>>   68.60%  sed  sed[.] regexec0
>>   29.93%  sed  [kernel.kallsyms]  [k] nmi
>>1.47%  sed  libc-2.24.so   [.] strlen
>>0.00%  sed  [kernel.kallsyms]  [k] __acct_update_integrals
>>
>> It's regexec0(). H... ah, my NUL wrapper is essentially calling strlen() 
>> on
>> the string before doing the regex each time (to figure out where the null to
>> skip to next is), that should go _after_ it tries to match this segment...
>>
>> Ok, and now that I've fixed that it's _still_ taking forever and strlen() is
>> dominating, but regexec0() is basically calling regexec() as the first thing 
>> and
>> returning immediately when it hits? The only thing that could be calling
>> strlen() is libc's regexec()? Hmmm...
>>
>> @@ -473,11 +473,15 @@ static void sed_line(char **pline, long plen)
>>  mlen, off, newlen;
>>
>>// Loop finding match in remaining line (up to remaining len)
>> -  while (!regexec0(reg, rline, len-(rline-line), 10, match, mflags)) {
>> +//  while

Re: [Toybox] pathological case in sed s///g

2019-05-06 Thread enh via Toybox

i think you might be able to use REG_STARTEND to avoid the implicit
strlen: 
https://www.freebsd.org/cgi/man.cgi?query=regex=3=freebsd-release-ports

REG_STARTEND
   The string is considered to start at string +
   pmatch[0].rm_so and to end before the byte located at
   string + pmatch[0].rm_eo, regardless of the value of
   nmatch.  See below for the definition of pmatch and nmatch.
   This is an extension, compatible with but not specified by
   IEEE Std 1003.2 (``POSIX.2''), and should be used with cau-
   tion in software intended to be portable to other systems.

   Without REG_NOTBOL, the position rm_so is considered the
   beginning of a line, such that `^' matches before it, and
   the beginning of a word if there is a word character at
   this position, such that `[[:<:]]' and `\<' match before
   it.

   With REG_NOTBOL, the character at position rm_so is treated
   as the continuation of a line, and if rm_so is greater than
   0, the preceding character is taken into consideration.  If
   the preceding character is a newline and the regular
   expression was compiled with REG_NEWLINE, `^' matches
   before the string; if the preceding character is not a word
   character but the string starts with a word character,
   `[[:<:]]' and `\<' match before the string.

From: Rob Landley 
Date: Sun, May 5, 2019 at 9:41 AM
To: enh
Cc: toybox

> On 5/3/19 2:42 PM, enh wrote:
> > On Fri, May 3, 2019 at 11:59 AM Rob Landley  wrote:
> >>
> >> On 5/3/19 1:56 PM, Rob Landley wrote:
> >>> On 5/3/19 1:05 PM, enh wrote:
> >>> But yeah, the new pessimal case after the change I'm making now would be a
> >>> megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output 
> >>> buffer
> >>> thing...
> >>
> >> And it can be avoided by an in-place copy that remembers "last gap" and 
> >> doesn't
> >> copy trailing data until the _next_ hit (or we confirm there's no hit) so 
> >> we
> >> know how much of the skipped data is "confirmed" and only copy it once we 
> >> know
> >> we're keeping it.
> >
> > yeah, that's what i meant by only ensuring that the first n characters
> > are right.
> >
> >> So every kept byte is only copied once, and it can still be done in place 
> >> for
> >> the shrinking case. (You'll have to realloc when you expand, but that's 
> >> probably
> >> unavoidable...)
>
> Which I just did (not even trying in-place, everything gets copied once to a 
> new
> string), and it's still way too slow.
>
> Time to dig out https://jvns.ca/perf-zine.pdf and see where it's spending its
> time... Sigh. I forgot quite how much I hate perf, but:
>
>   68.60%  sed  sed[.] regexec0
>   29.93%  sed  [kernel.kallsyms]  [k] nmi
>1.47%  sed  libc-2.24.so   [.] strlen
>0.00%  sed  [kernel.kallsyms]  [k] __acct_update_integrals
>
> It's regexec0(). H... ah, my NUL wrapper is essentially calling strlen() 
> on
> the string before doing the regex each time (to figure out where the null to
> skip to next is), that should go _after_ it tries to match this segment...
>
> Ok, and now that I've fixed that it's _still_ taking forever and strlen() is
> dominating, but regexec0() is basically calling regexec() as the first thing 
> and
> returning immediately when it hits? The only thing that could be calling
> strlen() is libc's regexec()? Hmmm...
>
> @@ -473,11 +473,15 @@ static void sed_line(char **pline, long plen)
>  mlen, off, newlen;
>
>// Loop finding match in remaining line (up to remaining len)
> -  while (!regexec0(reg, rline, len-(rline-line), 10, match, mflags)) {
> +//  while (!regexec0(reg, rline, len-(rline-line), 10, match, mflags)) {
> +while (!strncmp(rline, "x", 1)) {
> +  match[0].rm_eo = 1;
> +  match[1].rm_so = 0;
> +
>
> And _that_ completes a megabyte input line basically instantly. It's regexec()
> calling strlen() each time pulling (n^2)/2 bytes (half a terabyte?) through 
> the
> l1 cache that's taking up all the time now.
>
> Grrr. So I either have to write my own regex plumbing (which I can do, I've 
> done
> it before years ago, but I really didn't _want_ to here because LIBC HAS ONE),
> or I need to make my wrapper special case literal matches, which means they
> don't have... let's see... (For the honor of greyskull "man 7
> regex"...)
>
>   ^|.[*+?(\{$
>
> Sigh. If I _was_ going to write my own regex plumbing I could make it take a
> length and properly match NUL bytes. On the one hand that's really a post-1.0
> todo item. On the other hand we're talking maybe 200 lines of code here, _and_
> it could probably speed up grep (I could go back to searching for multiple
> regexes in parallel... What _is_ involved in implementing lex, anyway...)
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-05 Thread Rob Landley

On 5/3/19 2:42 PM, enh wrote:
> On Fri, May 3, 2019 at 11:59 AM Rob Landley  wrote:
>>
>> On 5/3/19 1:56 PM, Rob Landley wrote:
>>> On 5/3/19 1:05 PM, enh wrote:
>>> But yeah, the new pessimal case after the change I'm making now would be a
>>> megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output buffer
>>> thing...
>>
>> And it can be avoided by an in-place copy that remembers "last gap" and 
>> doesn't
>> copy trailing data until the _next_ hit (or we confirm there's no hit) so we
>> know how much of the skipped data is "confirmed" and only copy it once we 
>> know
>> we're keeping it.
> 
> yeah, that's what i meant by only ensuring that the first n characters
> are right.
> 
>> So every kept byte is only copied once, and it can still be done in place for
>> the shrinking case. (You'll have to realloc when you expand, but that's 
>> probably
>> unavoidable...)

Which I just did (not even trying in-place, everything gets copied once to a new
string), and it's still way too slow.

Time to dig out https://jvns.ca/perf-zine.pdf and see where it's spending its
time... Sigh. I forgot quite how much I hate perf, but:

  68.60%  sed  sed[.] regexec0
  29.93%  sed  [kernel.kallsyms]  [k] nmi
   1.47%  sed  libc-2.24.so   [.] strlen
   0.00%  sed  [kernel.kallsyms]  [k] __acct_update_integrals

It's regexec0(). H... ah, my NUL wrapper is essentially calling strlen() on
the string before doing the regex each time (to figure out where the null to
skip to next is), that should go _after_ it tries to match this segment...

Ok, and now that I've fixed that it's _still_ taking forever and strlen() is
dominating, but regexec0() is basically calling regexec() as the first thing and
returning immediately when it hits? The only thing that could be calling
strlen() is libc's regexec()? Hmmm...

@@ -473,11 +473,15 @@ static void sed_line(char **pline, long plen)
 mlen, off, newlen;

   // Loop finding match in remaining line (up to remaining len)
-  while (!regexec0(reg, rline, len-(rline-line), 10, match, mflags)) {
+//  while (!regexec0(reg, rline, len-(rline-line), 10, match, mflags)) {
+while (!strncmp(rline, "x", 1)) {
+  match[0].rm_eo = 1;
+  match[1].rm_so = 0;
+

And _that_ completes a megabyte input line basically instantly. It's regexec()
calling strlen() each time pulling (n^2)/2 bytes (half a terabyte?) through the
l1 cache that's taking up all the time now.

Grrr. So I either have to write my own regex plumbing (which I can do, I've done
it before years ago, but I really didn't _want_ to here because LIBC HAS ONE),
or I need to make my wrapper special case literal matches, which means they
don't have... let's see... (For the honor of greyskull "man 7
regex"...)

  ^|.[*+?(\{$

Sigh. If I _was_ going to write my own regex plumbing I could make it take a
length and properly match NUL bytes. On the one hand that's really a post-1.0
todo item. On the other hand we're talking maybe 200 lines of code here, _and_
it could probably speed up grep (I could go back to searching for multiple
regexes in parallel... What _is_ involved in implementing lex, anyway...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-03 Thread enh via Toybox

On Fri, May 3, 2019 at 11:59 AM Rob Landley  wrote:
>
> On 5/3/19 1:56 PM, Rob Landley wrote:
> > On 5/3/19 1:05 PM, enh wrote:
> > But yeah, the new pessimal case after the change I'm making now would be a
> > megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output buffer
> > thing...
>
> And it can be avoided by an in-place copy that remembers "last gap" and 
> doesn't
> copy trailing data until the _next_ hit (or we confirm there's no hit) so we
> know how much of the skipped data is "confirmed" and only copy it once we know
> we're keeping it.

yeah, that's what i meant by only ensuring that the first n characters
are right.

> So every kept byte is only copied once, and it can still be done in place for
> the shrinking case. (You'll have to realloc when you expand, but that's 
> probably
> unavoidable...)
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-03 Thread Rob Landley

On 5/3/19 1:05 PM, enh wrote:
> BSD seems to avoid all the copying? i think they have a "source" and
> "destination" and only write to the latter (solving your issue), but
> only move forwards (so i don't think they try to maintain it as "what
> the whole result would look like if we only did this many
> replacements", rather "here's what the first n characters of the
> result will look like").

Oh I know it's solvable. Working on it.

The s/x/y/g case A) isn't changing the length of the string, B) has no backrefs.
So it doesn't need the copying, yes.

Mine only moves forwards too: line and len are the start of the line and the
whole line's length, but rline and rlen are the remaining line and remaining
length. I just responded to the "don't overwrite backref data while still using
it" problem by copying the whole line, when I didn't need to and it opens this
pathological case. Hence last email on what I _should_ do instead.

(Hmmm... the "one input buffer one output buffer" thing optimizes for the
"replace with shorter string" case where you avoid repeated memmove() on the
tail of the string. But it slightly pessimizes the cases where you _don't_ and
realloc() a lot. It's also a bigger change to the code, lemme see how much this
does first...)

But yeah, the new pessimal case after the change I'm making now would be a
megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output buffer
thing...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-03 Thread Rob Landley

On 5/3/19 1:56 PM, Rob Landley wrote:
> On 5/3/19 1:05 PM, enh wrote:
> But yeah, the new pessimal case after the change I'm making now would be a
> megabyte of xyxyxyxy with 's/xy/x/g' _THAT_ would need the one output buffer
> thing...

And it can be avoided by an in-place copy that remembers "last gap" and doesn't
copy trailing data until the _next_ hit (or we confirm there's no hit) so we
know how much of the skipped data is "confirmed" and only copy it once we know
we're keeping it.

So every kept byte is only copied once, and it can still be done in place for
the shrinking case. (You'll have to realloc when you expand, but that's probably
unavoidable...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-03 Thread enh via Toybox

BSD seems to avoid all the copying? i think they have a "source" and
"destination" and only write to the latter (solving your issue), but
only move forwards (so i don't think they try to maintain it as "what
the whole result would look like if we only did this many
replacements", rather "here's what the first n characters of the
result will look like").

On Fri, May 3, 2019 at 11:02 AM Rob Landley  wrote:
>
> On 5/3/19 12:40 PM, Rob Landley wrote:
> > On 5/2/19 9:46 PM, enh via Toybox wrote:
> >> i've known about this for a couple of days and haven't had time to
> >> look at it properly yet, so i should mention it here...
> >>
> >> if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
> >> BSD or GNU sed finishes immediately, but toybox takes forever.
> >
> > Because it allocates a replacement string and copies the 
> > before/replace/after
> > parts into it for each replacement. I should switch the xmalloc() on line 
> > 515 to
> > xrealloc() and fiddle with the surrounding code not to do unnecessary 
> > work...
>
> And luckily I left myself a comment about why I _didn't_ do that:
>
>   // Allocate new size, copy start/end around match. (Can't extend in
>   // place because backrefs may refer to text after it's overwritten.)
>
> I can realloc() if there were no backrefs, or I can make a temp copy of _just_
> the removed text (since all the backrefs have to be within match[0])...
>
> Hmmm... it's easy to record whether there were backrefs since the code right
> above there measures the new length including backrefs, so I can have it set a
> flag. The embedded newline support does NOT include matching a NUL (because 
> that
> would involve rewriting the regex library) so there can't be a NUL _within_ 
> the
> match so I can just conditionally xstrndup()...
>
> It's a bit fiddly, but not too bad so far...
>
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-03 Thread Rob Landley

On 5/3/19 12:40 PM, Rob Landley wrote:
> On 5/2/19 9:46 PM, enh via Toybox wrote:
>> i've known about this for a couple of days and haven't had time to
>> look at it properly yet, so i should mention it here...
>>
>> if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
>> BSD or GNU sed finishes immediately, but toybox takes forever.
> 
> Because it allocates a replacement string and copies the before/replace/after
> parts into it for each replacement. I should switch the xmalloc() on line 515 
> to
> xrealloc() and fiddle with the surrounding code not to do unnecessary work...

And luckily I left myself a comment about why I _didn't_ do that:

  // Allocate new size, copy start/end around match. (Can't extend in
  // place because backrefs may refer to text after it's overwritten.)

I can realloc() if there were no backrefs, or I can make a temp copy of _just_
the removed text (since all the backrefs have to be within match[0])...

Hmmm... it's easy to record whether there were backrefs since the code right
above there measures the new length including backrefs, so I can have it set a
flag. The embedded newline support does NOT include matching a NUL (because that
would involve rewriting the regex library) so there can't be a NUL _within_ the
match so I can just conditionally xstrndup()...

It's a bit fiddly, but not too bad so far...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

2019-05-03 Thread Rob Landley

On 5/2/19 9:46 PM, enh via Toybox wrote:
> i've known about this for a couple of days and haven't had time to
> look at it properly yet, so i should mention it here...
> 
> if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
> BSD or GNU sed finishes immediately, but toybox takes forever.

Because it allocates a replacement string and copies the before/replace/after
parts into it for each replacement. I should switch the xmalloc() on line 515 to
xrealloc() and fiddle with the surrounding code not to do unnecessary work...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

[Toybox] pathological case in sed s///g

2019-05-02 Thread enh via Toybox

i've known about this for a couple of days and haven't had time to
look at it properly yet, so i should mention it here...

if you have a file with a 1MiB line of 'x'es and you sed 's/x/y/g',
BSD or GNU sed finishes immediately, but toybox takes forever.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

Re: [Toybox] pathological case in sed s///g

[Toybox] pathological case in sed s///g

19 matches

Site Navigation

Mail list logo

Footer information