On Sun, Nov 18, 2018 at 02:14:07AM +0100, Ingo Schwarze wrote:
> 
> currently, when you call apropos(1) in the default mode without
> explicitly specifying '=' for substring search or '~' for regular
> expression search, page names and one-line descriptions are
> searched case-insensitively for the substring specified.
> 
> It appears that traditionally, FreeBSD apropos used to treat
> the argument as a regular expression in this mode, and so does
> the apropos contained in the man-db package which is common on
> Linux; see:  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223556
> 
> Yuri Pankov suggests an "#ifdef __FreeBSD__" stunt in portable
> mandoc, but i think switching to regular expressions by default
> would be beneficial for OpenBSD as well: more powerful, and closer
> to what other systems do.
> 
> It is quite rare that one wants to search for words including regular
> expression special characters.  After the change, it will still be
> possible by either escaping them, as in
> 
>   $ apropos 'c\+\+'
>   $ apropos '\|x\|'     # yields trunc(3)
>   $ apropos '\$\['      # yields arybase(3p)
> 
> or by explicitly requesting substring search with the already
> existing and already documented '=' operator, as in
> 
>   $ apropos =c++
>   $ apropos '=|x|'
>   $ apropos =$[
> 
> Any concerns about committing the patch below?
> 
> Note that i am *not* proposing to change the behaviour with respect
> to case sensitivity.  Default behaviour will remain case insensitive,
> substring search will remain always case insensitive.  The
> explicit '~' operator will remain case-sensitive, unless the
> already existing and documented option -i is specified.

Unsure about full the implications of breaking backwards compatibility
for the interpretation of special characters, but for the typical usage
and the vast majority of manpage titles I think this makes the default
behavior more powerful without laying a minefield of "gotchas" for the
user.  This is nice.

I mean, I guess there's c++(1)/g++(1).  Currently "apropos c++" just
finds what you're looking for instead of complaining about RE syntax
like this:

$ apropos c++
apropos: regcomp /c++/: repetition-operator operand invalid
apropos: ignoring trailing 

... but that's the best annoying breakage I've got.

ok cheloha@, with one pseudo-nit in-line.

P.S. I had never looked into it before, but this is the behavior
specified for man(1)'s '-k' option since at least SUSv2.  That is,
arguments to "man -k" should, according to the spec, be interpreted
as case-insensitive extended regular expressions and not merely
string literals.

So, as 'man -k' is just apropos(1), this change would make man(1)
more compliant with POSIX.1-2008, which we claim now in man.1 with
its current (apparently non-compliant?) behavior anyway.

Unclear if this is accidental or what.

> Index: apropos.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/apropos.1,v
> retrieving revision 1.39
> diff -u -p -r1.39 apropos.1
> --- apropos.1 23 Feb 2018 18:53:49 -0000      1.39
> +++ apropos.1 18 Nov 2018 00:33:47 -0000
> @@ -51,8 +51,7 @@ searches for
>  .Xr makewhatis 8
>  databases in the default paths stipulated by
>  .Xr man 1
> -and uses case-insensitive substring matching
> -.Pq the Cm = No operator
> +and uses case-insensitive regular expression matching

You could specify that these are extended, i.e. not basic, regular
expressions.  I always appreciate when it's spelled out, but my 
guess is that most people assume EREs when it isn't specified.

Up to you.

>  over manual names and descriptions
>  .Pq the Li \&Nm No and Li \&Nd No macro keys .
>  Multiple terms imply pairwise
> Index: mansearch.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/mansearch.c,v
> retrieving revision 1.60
> diff -u -p -r1.60 mansearch.c
> --- mansearch.c       22 Aug 2017 17:50:02 -0000      1.60
> +++ mansearch.c       18 Nov 2018 00:33:47 -0000
> @@ -764,8 +764,9 @@ exprterm(const struct mansearch *search,
>               cs = 0;
>       } else if ((val = strpbrk(argv[*argi], "=~")) == NULL) {
>               e->bits = TYPE_Nm | TYPE_Nd;
> -             e->match.type = DBM_SUB;
> -             e->match.str = argv[*argi];
> +             e->match.type = DBM_REGEX;
> +             val = argv[*argi];
> +             cs = 0;
>       } else {
>               if (val == argv[*argi])
>                       e->bits = TYPE_Nm | TYPE_Nd;
> 

Reply via email to