On Sun, Nov 18, 2018 at 02:14:07AM +0100, Ingo Schwarze wrote: > > currently, when you call apropos(1) in the default mode without > explicitly specifying '=' for substring search or '~' for regular > expression search, page names and one-line descriptions are > searched case-insensitively for the substring specified. > > It appears that traditionally, FreeBSD apropos used to treat > the argument as a regular expression in this mode, and so does > the apropos contained in the man-db package which is common on > Linux; see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223556 > > Yuri Pankov suggests an "#ifdef __FreeBSD__" stunt in portable > mandoc, but i think switching to regular expressions by default > would be beneficial for OpenBSD as well: more powerful, and closer > to what other systems do. > > It is quite rare that one wants to search for words including regular > expression special characters. After the change, it will still be > possible by either escaping them, as in > > $ apropos 'c\+\+' > $ apropos '\|x\|' # yields trunc(3) > $ apropos '\$\[' # yields arybase(3p) > > or by explicitly requesting substring search with the already > existing and already documented '=' operator, as in > > $ apropos =c++ > $ apropos '=|x|' > $ apropos =$[ > > Any concerns about committing the patch below? > > Note that i am *not* proposing to change the behaviour with respect > to case sensitivity. Default behaviour will remain case insensitive, > substring search will remain always case insensitive. The > explicit '~' operator will remain case-sensitive, unless the > already existing and documented option -i is specified.
Unsure about full the implications of breaking backwards compatibility for the interpretation of special characters, but for the typical usage and the vast majority of manpage titles I think this makes the default behavior more powerful without laying a minefield of "gotchas" for the user. This is nice. I mean, I guess there's c++(1)/g++(1). Currently "apropos c++" just finds what you're looking for instead of complaining about RE syntax like this: $ apropos c++ apropos: regcomp /c++/: repetition-operator operand invalid apropos: ignoring trailing ... but that's the best annoying breakage I've got. ok cheloha@, with one pseudo-nit in-line. P.S. I had never looked into it before, but this is the behavior specified for man(1)'s '-k' option since at least SUSv2. That is, arguments to "man -k" should, according to the spec, be interpreted as case-insensitive extended regular expressions and not merely string literals. So, as 'man -k' is just apropos(1), this change would make man(1) more compliant with POSIX.1-2008, which we claim now in man.1 with its current (apparently non-compliant?) behavior anyway. Unclear if this is accidental or what. > Index: apropos.1 > =================================================================== > RCS file: /cvs/src/usr.bin/mandoc/apropos.1,v > retrieving revision 1.39 > diff -u -p -r1.39 apropos.1 > --- apropos.1 23 Feb 2018 18:53:49 -0000 1.39 > +++ apropos.1 18 Nov 2018 00:33:47 -0000 > @@ -51,8 +51,7 @@ searches for > .Xr makewhatis 8 > databases in the default paths stipulated by > .Xr man 1 > -and uses case-insensitive substring matching > -.Pq the Cm = No operator > +and uses case-insensitive regular expression matching You could specify that these are extended, i.e. not basic, regular expressions. I always appreciate when it's spelled out, but my guess is that most people assume EREs when it isn't specified. Up to you. > over manual names and descriptions > .Pq the Li \&Nm No and Li \&Nd No macro keys . > Multiple terms imply pairwise > Index: mansearch.c > =================================================================== > RCS file: /cvs/src/usr.bin/mandoc/mansearch.c,v > retrieving revision 1.60 > diff -u -p -r1.60 mansearch.c > --- mansearch.c 22 Aug 2017 17:50:02 -0000 1.60 > +++ mansearch.c 18 Nov 2018 00:33:47 -0000 > @@ -764,8 +764,9 @@ exprterm(const struct mansearch *search, > cs = 0; > } else if ((val = strpbrk(argv[*argi], "=~")) == NULL) { > e->bits = TYPE_Nm | TYPE_Nd; > - e->match.type = DBM_SUB; > - e->match.str = argv[*argi]; > + e->match.type = DBM_REGEX; > + val = argv[*argi]; > + cs = 0; > } else { > if (val == argv[*argi]) > e->bits = TYPE_Nm | TYPE_Nd; >