Re: escape man(1) arguments from glob(3)

2011-11-20 Thread Ingo Schwarze
Hi Joerg,

Joerg Sonnenberger wrote on Sun, Nov 20, 2011 at 06:00:16AM +0100:
 On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote:

 Or we need the patch below.
 It looks a bit messy, but i think it is safe.

 Invert the logic. Check with strcspn first, if there is a character that
 needs quoting. This is not the default case after all. If you do,
 allocate escpage, quote to it and reassign page. Otherwise, just set
 escpage to NULL.

That seemed better at first, but actually, it isn't.

In case there are no characters to escape, my code is O(N*M), where N
is the length of the name and M is the number of metacharacters.
The strcspn function is O(N*M) as well.  Besides, M is constant
and N is small in all relevant cases, so the difference is a constant
times a small effort at worst.  Even if there is a speed difference
due to economizing N function calls to strchr and one function call
to malloc, that is not going to matter in view of the rest of the code.

On the other hand, your suggestion is considerably more complicated.
I would have to add the initial check with strcspn, but i would
still need all of my code, now inside an if.  And then either,
at each use of escpage, check it against NULL and use page instead,
or assign page (instead of NULL) to escpage and then check for
(escpage != page) before freeing escpage at function exit.

So your suggestion would just complicate the code, and i'm still
asking for OKs for my original patch (appending it again for easy
reference).

Yours,
  Ingo

P.S.
What would help to get to O(N) is a lookup table like it's used
in libc/gen/ctype_.c, but as this code is just executed once per
manual to be displayed and not in an inner loop, there is no need
for micro-optimization at the expense of readability.


Index: man.c
===
RCS file: /cvs/src/usr.bin/man/man.c,v
retrieving revision 1.43
diff -u -p -r1.43 man.c
--- man.c   7 Jul 2011 04:24:35 -   1.43
+++ man.c   20 Nov 2011 01:02:49 -
@@ -435,11 +435,26 @@ manual(char *page, TAG *tag, glob_t *pg)
ENTRY *ep, *e_sufp, *e_tag;
TAG *missp, *sufp;
int anyfound, cnt, found, globres;
-   char *p, buf[MAXPATHLEN];
+   char *p, *escpage, buf[MAXPATHLEN];
 
anyfound = 0;
buf[0] = '*';
 
+   /* Escape the page name for glob(3); consider man [. */
+   if ((escpage = malloc(2 * strlen(page) + 1)) == NULL) {
+   warn(NULL);
+   cleanup(0);
+   exit(1);
+   }
+   p = page;
+   cnt = 0;
+   while (*p != '\0') {
+   if (strchr(*?[\\]{}, *p))
+   escpage[cnt++] = '\\';
+   escpage[cnt++] = *p++;
+   }
+   escpage[cnt] = '\0';
+
/* Expand the search path. */
if (f_all != f_where) {
e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list);
@@ -473,7 +488,7 @@ manual(char *page, TAG *tag, glob_t *pg)
/* For each element in the list... */
e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list);
for (; e_tag != NULL; e_tag = TAILQ_NEXT(e_tag, q)) {
-   (void)snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, page);
+   snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, escpage);
switch (glob(buf, GLOB_APPEND | GLOB_BRACE | GLOB_NOSORT,
NULL, pg)) {
case (0):
@@ -507,7 +522,7 @@ manual(char *page, TAG *tag, glob_t *pg)
 * We just test for .0 first, it's fast and probably
 * going to hit.
 */
-   (void)snprintf(buf, sizeof(buf), */%s.0, page);
+   snprintf(buf, sizeof(buf), */%s.0, escpage);
if (!fnmatch(buf, pg-gl_pathv[cnt], 0))
goto next;
 
@@ -515,8 +530,8 @@ manual(char *page, TAG *tag, glob_t *pg)
NULL : TAILQ_FIRST(sufp-list);
for (found = 0;
e_sufp != NULL; e_sufp = TAILQ_NEXT(e_sufp, q)) {
-   (void)snprintf(buf,
-sizeof(buf), */%s%s, page, e_sufp-s);
+   snprintf(buf, sizeof(buf),
+   */%s%s, escpage, e_sufp-s);
if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) {
found = 1;
break;
@@ -535,8 +550,8 @@ manual(char *page, TAG *tag, glob_t *pg)
if (*p == '\0')
continue;
*p = '\0';
-   (void)snprintf(buf,
-sizeof(buf), */%s%s, page, e_sufp-s);
+   snprintf(buf, sizeof(buf),
+   */%s%s, escpage, 

Re: escape man(1) arguments from glob(3)

2011-11-20 Thread Joerg Sonnenberger
On Sun, Nov 20, 2011 at 12:57:40PM +0100, Ingo Schwarze wrote:
 Hi Joerg,
 
 Joerg Sonnenberger wrote on Sun, Nov 20, 2011 at 06:00:16AM +0100:
  On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote:
 
  Or we need the patch below.
  It looks a bit messy, but i think it is safe.
 
  Invert the logic. Check with strcspn first, if there is a character that
  needs quoting. This is not the default case after all. If you do,
  allocate escpage, quote to it and reassign page. Otherwise, just set
  escpage to NULL.
 
 That seemed better at first, but actually, it isn't.
 
 In case there are no characters to escape, my code is O(N*M), where N
 is the length of the name and M is the number of metacharacters.
 The strcspn function is O(N*M) as well.  Besides, M is constant
 and N is small in all relevant cases, so the difference is a constant
 times a small effort at worst.  Even if there is a speed difference
 due to economizing N function calls to strchr and one function call
 to malloc, that is not going to matter in view of the rest of the code.

Fix your inefficent strcspn implementation in that case. It's a classic
O(N+M) algorithm. My main point is that you require changes all over the
place, including copying the string to handle a rare case. With the
inverted logic, you do a single pass over the input string for almost
all arguments are done.

Joerg