Re: escape man(1) arguments from glob(3)
Hi Joerg, Joerg Sonnenberger wrote on Sun, Nov 20, 2011 at 06:00:16AM +0100: On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote: Or we need the patch below. It looks a bit messy, but i think it is safe. Invert the logic. Check with strcspn first, if there is a character that needs quoting. This is not the default case after all. If you do, allocate escpage, quote to it and reassign page. Otherwise, just set escpage to NULL. That seemed better at first, but actually, it isn't. In case there are no characters to escape, my code is O(N*M), where N is the length of the name and M is the number of metacharacters. The strcspn function is O(N*M) as well. Besides, M is constant and N is small in all relevant cases, so the difference is a constant times a small effort at worst. Even if there is a speed difference due to economizing N function calls to strchr and one function call to malloc, that is not going to matter in view of the rest of the code. On the other hand, your suggestion is considerably more complicated. I would have to add the initial check with strcspn, but i would still need all of my code, now inside an if. And then either, at each use of escpage, check it against NULL and use page instead, or assign page (instead of NULL) to escpage and then check for (escpage != page) before freeing escpage at function exit. So your suggestion would just complicate the code, and i'm still asking for OKs for my original patch (appending it again for easy reference). Yours, Ingo P.S. What would help to get to O(N) is a lookup table like it's used in libc/gen/ctype_.c, but as this code is just executed once per manual to be displayed and not in an inner loop, there is no need for micro-optimization at the expense of readability. Index: man.c === RCS file: /cvs/src/usr.bin/man/man.c,v retrieving revision 1.43 diff -u -p -r1.43 man.c --- man.c 7 Jul 2011 04:24:35 - 1.43 +++ man.c 20 Nov 2011 01:02:49 - @@ -435,11 +435,26 @@ manual(char *page, TAG *tag, glob_t *pg) ENTRY *ep, *e_sufp, *e_tag; TAG *missp, *sufp; int anyfound, cnt, found, globres; - char *p, buf[MAXPATHLEN]; + char *p, *escpage, buf[MAXPATHLEN]; anyfound = 0; buf[0] = '*'; + /* Escape the page name for glob(3); consider man [. */ + if ((escpage = malloc(2 * strlen(page) + 1)) == NULL) { + warn(NULL); + cleanup(0); + exit(1); + } + p = page; + cnt = 0; + while (*p != '\0') { + if (strchr(*?[\\]{}, *p)) + escpage[cnt++] = '\\'; + escpage[cnt++] = *p++; + } + escpage[cnt] = '\0'; + /* Expand the search path. */ if (f_all != f_where) { e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list); @@ -473,7 +488,7 @@ manual(char *page, TAG *tag, glob_t *pg) /* For each element in the list... */ e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list); for (; e_tag != NULL; e_tag = TAILQ_NEXT(e_tag, q)) { - (void)snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, page); + snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, escpage); switch (glob(buf, GLOB_APPEND | GLOB_BRACE | GLOB_NOSORT, NULL, pg)) { case (0): @@ -507,7 +522,7 @@ manual(char *page, TAG *tag, glob_t *pg) * We just test for .0 first, it's fast and probably * going to hit. */ - (void)snprintf(buf, sizeof(buf), */%s.0, page); + snprintf(buf, sizeof(buf), */%s.0, escpage); if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) goto next; @@ -515,8 +530,8 @@ manual(char *page, TAG *tag, glob_t *pg) NULL : TAILQ_FIRST(sufp-list); for (found = 0; e_sufp != NULL; e_sufp = TAILQ_NEXT(e_sufp, q)) { - (void)snprintf(buf, -sizeof(buf), */%s%s, page, e_sufp-s); + snprintf(buf, sizeof(buf), + */%s%s, escpage, e_sufp-s); if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) { found = 1; break; @@ -535,8 +550,8 @@ manual(char *page, TAG *tag, glob_t *pg) if (*p == '\0') continue; *p = '\0'; - (void)snprintf(buf, -sizeof(buf), */%s%s, page, e_sufp-s); + snprintf(buf, sizeof(buf), + */%s%s, escpage,
Re: escape man(1) arguments from glob(3)
On Sun, Nov 20, 2011 at 12:57:40PM +0100, Ingo Schwarze wrote: Hi Joerg, Joerg Sonnenberger wrote on Sun, Nov 20, 2011 at 06:00:16AM +0100: On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote: Or we need the patch below. It looks a bit messy, but i think it is safe. Invert the logic. Check with strcspn first, if there is a character that needs quoting. This is not the default case after all. If you do, allocate escpage, quote to it and reassign page. Otherwise, just set escpage to NULL. That seemed better at first, but actually, it isn't. In case there are no characters to escape, my code is O(N*M), where N is the length of the name and M is the number of metacharacters. The strcspn function is O(N*M) as well. Besides, M is constant and N is small in all relevant cases, so the difference is a constant times a small effort at worst. Even if there is a speed difference due to economizing N function calls to strchr and one function call to malloc, that is not going to matter in view of the rest of the code. Fix your inefficent strcspn implementation in that case. It's a classic O(N+M) algorithm. My main point is that you require changes all over the place, including copying the string to handle a rare case. With the inverted logic, you do a single pass over the input string for almost all arguments are done. Joerg
escape man(1) arguments from glob(3)
Hi, i'm moving this thread here from misc@ because i'm proposing a patch to fix the issue. Jason McIntyre wrote on Sat, Nov 19, 2011 at 08:45:33AM +: On Fri, Nov 18, 2011 at 11:04:12PM -0700, Barry Grumbine wrote: man [ used to give me the test(1) manpage, doesn't anymore. Is that something that needs fixin'? something weird, right enough. the man page is still installed (/usr/share/man/man1/[.1), but man(1) does not pick it up. ingo, could this be anything to do with the code that determines whether pages exist pre-formatted or not, and try to display the more recent of the two? Gah, right, i had this in my TODO file since c2k11, but completely forgot about it: - man(1) does not find man1/[.1 because */[.[1-9n] fails to match however, it did find cat1/[.0 because */[.0 matches that think about a fix... What happens is that the name command line argument to man(1) is used for glob(3) and fnmatch(3) in man.c, internally. So currently, $ man -a \* # one backslash escapes from the shell [ ... quite a lot of pages ... oops ... ] $ man \\* # one eaten by the shell, the other by glob(3) man: no entry for * in the manual. $ man [ # no need to escape from the shell man: no entry for [ in the manual. $ man \\[ TEST(1) [...] Either we have to document that the name argument to man(1) allows shell globbing, while the keyword argument to man -k does not. Do we really want that? I guess not, in particular since the exact meaning of special characters may depend on the suffixes defined in man.conf(5), as you can see above, which is a serious offence against the principle of least surprise. Or we need the patch below. It looks a bit messy, but i think it is safe. Thoughts, OKs? Ingo Index: man.c === RCS file: /cvs/src/usr.bin/man/man.c,v retrieving revision 1.43 diff -u -p -r1.43 man.c --- man.c 7 Jul 2011 04:24:35 - 1.43 +++ man.c 20 Nov 2011 01:02:49 - @@ -435,11 +435,26 @@ manual(char *page, TAG *tag, glob_t *pg) ENTRY *ep, *e_sufp, *e_tag; TAG *missp, *sufp; int anyfound, cnt, found, globres; - char *p, buf[MAXPATHLEN]; + char *p, *escpage, buf[MAXPATHLEN]; anyfound = 0; buf[0] = '*'; + /* Escape the page name for glob(3); consider man [. */ + if ((escpage = malloc(2 * strlen(page) + 1)) == NULL) { + warn(NULL); + cleanup(0); + exit(1); + } + p = page; + cnt = 0; + while (*p != '\0') { + if (strchr(*?[\\]{}, *p)) + escpage[cnt++] = '\\'; + escpage[cnt++] = *p++; + } + escpage[cnt] = '\0'; + /* Expand the search path. */ if (f_all != f_where) { e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list); @@ -473,7 +488,7 @@ manual(char *page, TAG *tag, glob_t *pg) /* For each element in the list... */ e_tag = tag == NULL ? NULL : TAILQ_FIRST(tag-list); for (; e_tag != NULL; e_tag = TAILQ_NEXT(e_tag, q)) { - (void)snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, page); + snprintf(buf, sizeof(buf), %s/%s.*, e_tag-s, escpage); switch (glob(buf, GLOB_APPEND | GLOB_BRACE | GLOB_NOSORT, NULL, pg)) { case (0): @@ -507,7 +522,7 @@ manual(char *page, TAG *tag, glob_t *pg) * We just test for .0 first, it's fast and probably * going to hit. */ - (void)snprintf(buf, sizeof(buf), */%s.0, page); + snprintf(buf, sizeof(buf), */%s.0, escpage); if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) goto next; @@ -515,8 +530,8 @@ manual(char *page, TAG *tag, glob_t *pg) NULL : TAILQ_FIRST(sufp-list); for (found = 0; e_sufp != NULL; e_sufp = TAILQ_NEXT(e_sufp, q)) { - (void)snprintf(buf, -sizeof(buf), */%s%s, page, e_sufp-s); + snprintf(buf, sizeof(buf), + */%s%s, escpage, e_sufp-s); if (!fnmatch(buf, pg-gl_pathv[cnt], 0)) { found = 1; break; @@ -535,8 +550,8 @@ manual(char *page, TAG *tag, glob_t *pg) if (*p == '\0') continue; *p = '\0'; - (void)snprintf(buf, -sizeof(buf), */%s%s, page, e_sufp-s); + snprintf(buf, sizeof(buf), + */%s%s, escpage, e_sufp-s);
Re: escape man(1) arguments from glob(3)
On Sun, Nov 20, 2011 at 02:27:39AM +0100, Ingo Schwarze wrote: Or we need the patch below. It looks a bit messy, but i think it is safe. Thoughts, OKs? Invert the logic. Check with strcspn first, if there is a character that needs quoting. This is not the default case after all. If you do, allocate escpage, quote to it and reassign page. Otherwise, just set escpage to NULL. Joerg