Re: escape man(1) arguments from glob(3)

2011-11-20 Thread Ingo Schwarze
Hi Joerg,

Joerg Sonnenberger wrote on Sun, Nov 20, 2011 at 06:00:16AM +0100:
 On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote:

 Or we need the patch below.
 It looks a bit messy, but i think it is safe.

 Invert the logic. Check with strcspn first, if there is a character that
 needs quoting. This is not the default case after all. If you do,
 allocate escpage, quote to it and reassign page. Otherwise, just set
 escpage to NULL.

That seemed better at first, but actually, it isn't.

In case there are no characters to escape, my code is O(N*M), where N
is the length of the name and M is the number of metacharacters.
The strcspn function is O(N*M) as well.  Besides, M is constant
and N is small in all relevant cases, so the difference is a constant
times a small effort at worst.  Even if there is a speed difference
due to economizing N function calls to strchr and one function call
to malloc, that is not going to matter in view of the rest of the code.

On the other hand, your suggestion is considerably more complicated.
I would have to add the initial check with strcspn, but i would
still need all of my code, now inside an if.  And then either,
at each use of escpage, check it against NULL and use page instead,
or assign page (instead of NULL) to escpage and then check for
(escpage != page) before freeing escpage at function exit.

So your suggestion would just complicate the code, and i'm still
asking for OKs for my original patch (appending it again for easy
reference).

Yours,
  Ingo

P.S.
What would help to get to O(N) is a lookup table like it's used
in libc/gen/ctype_.c, but as this code is just executed once per
manual to be displayed and not in an inner loop, there is no need
for micro-optimization at the expense of readability.


Index: man.c
===
RCS file: /cvs/src/usr.bin/man/man.c,v
retrieving revision 1.43
diff -u -p -r1.43 man.c
--- man.c   7 Jul 2011 04:24:35 -   1.43
+++ man.c   20 Nov 2011 01:02:49 -
@@ -435,11 +435,26 @@ manual(char *page, TAG *tag, glob_t *pg)
ENTRY *ep, *e_sufp, *e_tag;
TAG *missp, *sufp;
int anyfound, cnt, found, globres;
-   char *p, buf[MAXPATHLEN];
+   char *p, *escpage, buf[MAXPATHLEN];
 
anyfound = 0;
buf[0] = '*';
 
+   /* Escape the page name for glob(3); consider man [. */
+   if ((escpage = malloc(2 * strlen(page) + 1)) == NULL) {
+   warn(NULL);
+   cleanup(0);
+   exit(1);
+   }
+   p = page;
+   cnt = 0;
+   while (*p != '\0') {
+   if (strchr(*?[\\]{}, *p))
+   escpage[cnt++] = '\\';
+   escpage[cnt++] = *p++;
+   }
+   escpage[cnt] = '\0';
+
/* Expand the search path. */
if (f_all != f_where) {
e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list);
@@ -473,7 +488,7 @@ manual(char *page, TAG *tag, glob_t *pg)
/* For each element in the list... */
e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list);
for (; e_tag != NULL; e_tag = TAILQ_NEXT(e_tag, q)) {
-   (void)snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, page);
+   snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, escpage);
switch (glob(buf, GLOB_APPEND | GLOB_BRACE | GLOB_NOSORT,
NULL, pg)) {
case (0):
@@ -507,7 +522,7 @@ manual(char *page, TAG *tag, glob_t *pg)
 * We just test for .0 first, it's fast and probably
 * going to hit.
 */
-   (void)snprintf(buf, sizeof(buf), */%s.0, page);
+   snprintf(buf, sizeof(buf), */%s.0, escpage);
if (!fnmatch(buf, pg-gl_pathv[cnt], 0))
goto next;
 
@@ -515,8 +530,8 @@ manual(char *page, TAG *tag, glob_t *pg)
NULL : TAILQ_FIRST(sufp-list);
for (found = 0;
e_sufp != NULL; e_sufp = TAILQ_NEXT(e_sufp, q)) {
-   (void)snprintf(buf,
-sizeof(buf), */%s%s, page, e_sufp-s);
+   snprintf(buf, sizeof(buf),
+   */%s%s, escpage, e_sufp-s);
if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) {
found = 1;
break;
@@ -535,8 +550,8 @@ manual(char *page, TAG *tag, glob_t *pg)
if (*p == '\0')
continue;
*p = '\0';
-   (void)snprintf(buf,
-sizeof(buf), */%s%s, page, e_sufp-s);
+   snprintf(buf, sizeof(buf),
+   */%s%s, escpage, 

Re: escape man(1) arguments from glob(3)

2011-11-20 Thread Joerg Sonnenberger
On Sun, Nov 20, 2011 at 12:57:40PM +0100, Ingo Schwarze wrote:
 Hi Joerg,
 
 Joerg Sonnenberger wrote on Sun, Nov 20, 2011 at 06:00:16AM +0100:
  On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote:
 
  Or we need the patch below.
  It looks a bit messy, but i think it is safe.
 
  Invert the logic. Check with strcspn first, if there is a character that
  needs quoting. This is not the default case after all. If you do,
  allocate escpage, quote to it and reassign page. Otherwise, just set
  escpage to NULL.
 
 That seemed better at first, but actually, it isn't.
 
 In case there are no characters to escape, my code is O(N*M), where N
 is the length of the name and M is the number of metacharacters.
 The strcspn function is O(N*M) as well.  Besides, M is constant
 and N is small in all relevant cases, so the difference is a constant
 times a small effort at worst.  Even if there is a speed difference
 due to economizing N function calls to strchr and one function call
 to malloc, that is not going to matter in view of the rest of the code.

Fix your inefficent strcspn implementation in that case. It's a classic
O(N+M) algorithm. My main point is that you require changes all over the
place, including copying the string to handle a rare case. With the
inverted logic, you do a single pass over the input string for almost
all arguments are done.

Joerg



escape man(1) arguments from glob(3)

2011-11-19 Thread Ingo Schwarze
Hi,

i'm moving this thread here from misc@ because i'm proposing a patch
to fix the issue.

Jason McIntyre wrote on Sat, Nov 19, 2011 at 08:45:33AM +:
 On Fri, Nov 18, 2011 at 11:04:12PM -0700, Barry Grumbine wrote:

 man [ used to give me the test(1) manpage, doesn't anymore.
 Is that something that needs fixin'?

 something weird, right enough. the man page is still installed
 (/usr/share/man/man1/[.1), but man(1) does not pick it up.
 
 ingo, could this be anything to do with the code that determines whether
 pages exist pre-formatted or not, and try to display the more recent of
 the two?

Gah, right, i had this in my TODO file since c2k11, but completely
forgot about it:

 - man(1) does not find man1/[.1 because */[.[1-9n] fails to match
   however, it did find cat1/[.0 because */[.0 matches that
   think about a fix...

What happens is that the name command line argument to man(1)
is used for glob(3) and fnmatch(3) in man.c, internally.

So currently,

  $ man -a \*   # one backslash escapes from the shell
  [ ... quite a lot of pages ... oops ... ]
  $ man \\* # one eaten by the shell, the other by glob(3)
  man: no entry for * in the manual.
  $ man [   # no need to escape from the shell
  man: no entry for [ in the manual.
  $ man \\[
  TEST(1)  [...]

Either we have to document that the name argument to man(1)
allows shell globbing, while the keyword argument to man -k
does not.  Do we really want that?  I guess not, in particular
since the exact meaning of special characters may depend on the
suffixes defined in man.conf(5), as you can see above, which
is a serious offence against the principle of least surprise.

Or we need the patch below.
It looks a bit messy, but i think it is safe.

Thoughts, OKs?
  Ingo


Index: man.c
===
RCS file: /cvs/src/usr.bin/man/man.c,v
retrieving revision 1.43
diff -u -p -r1.43 man.c
--- man.c   7 Jul 2011 04:24:35 -   1.43
+++ man.c   20 Nov 2011 01:02:49 -
@@ -435,11 +435,26 @@ manual(char *page, TAG *tag, glob_t *pg)
ENTRY *ep, *e_sufp, *e_tag;
TAG *missp, *sufp;
int anyfound, cnt, found, globres;
-   char *p, buf[MAXPATHLEN];
+   char *p, *escpage, buf[MAXPATHLEN];
 
anyfound = 0;
buf[0] = '*';
 
+   /* Escape the page name for glob(3); consider man [. */
+   if ((escpage = malloc(2 * strlen(page) + 1)) == NULL) {
+   warn(NULL);
+   cleanup(0);
+   exit(1);
+   }
+   p = page;
+   cnt = 0;
+   while (*p != '\0') {
+   if (strchr(*?[\\]{}, *p))
+   escpage[cnt++] = '\\';
+   escpage[cnt++] = *p++;
+   }
+   escpage[cnt] = '\0';
+
/* Expand the search path. */
if (f_all != f_where) {
e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list);
@@ -473,7 +488,7 @@ manual(char *page, TAG *tag, glob_t *pg)
/* For each element in the list... */
e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list);
for (; e_tag != NULL; e_tag = TAILQ_NEXT(e_tag, q)) {
-   (void)snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, page);
+   snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, escpage);
switch (glob(buf, GLOB_APPEND | GLOB_BRACE | GLOB_NOSORT,
NULL, pg)) {
case (0):
@@ -507,7 +522,7 @@ manual(char *page, TAG *tag, glob_t *pg)
 * We just test for .0 first, it's fast and probably
 * going to hit.
 */
-   (void)snprintf(buf, sizeof(buf), */%s.0, page);
+   snprintf(buf, sizeof(buf), */%s.0, escpage);
if (!fnmatch(buf, pg-gl_pathv[cnt], 0))
goto next;
 
@@ -515,8 +530,8 @@ manual(char *page, TAG *tag, glob_t *pg)
NULL : TAILQ_FIRST(sufp-list);
for (found = 0;
e_sufp != NULL; e_sufp = TAILQ_NEXT(e_sufp, q)) {
-   (void)snprintf(buf,
-sizeof(buf), */%s%s, page, e_sufp-s);
+   snprintf(buf, sizeof(buf),
+   */%s%s, escpage, e_sufp-s);
if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) {
found = 1;
break;
@@ -535,8 +550,8 @@ manual(char *page, TAG *tag, glob_t *pg)
if (*p == '\0')
continue;
*p = '\0';
-   (void)snprintf(buf,
-sizeof(buf), */%s%s, page, e_sufp-s);
+   snprintf(buf, sizeof(buf),
+   */%s%s, escpage, e_sufp-s);

Re: escape man(1) arguments from glob(3)

2011-11-19 Thread Joerg Sonnenberger
On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote:
 Or we need the patch below.
 It looks a bit messy, but i think it is safe.
 
 Thoughts, OKs?

Invert the logic. Check with strcspn first, if there is a character that
needs quoting. This is not the default case after all. If you do,
allocate escpage, quote to it and reassign page. Otherwise, just set
escpage to NULL.

Joerg