Re: [hackers] [PATCH v3][sbase] libutil/unescape.c: simplify and add \E
On Mon, 06 Feb 2017 15:50:04 -0800 evan.ga...@gmail.com (Evan Gates) wrote: > Mattias Andréewrote: > > > On Mon, 06 Feb 2017 15:05:32 -0800 > > evan.ga...@gmail.com (Evan Gates) wrote: > > > > > Mattias Andrée wrote: > > > > > > > + } else if (escapes[*r & 255]) { > > > > + *w++ = escapes[*r++ & > > > > 255]; > > > > > > Why do you & 255 here? I think a cast to unsigned char > > > would accomplish what you are trying to do and be more > > > correct (as char can default to either signed or > > > unsigned). Although I may misunderstand what is going > > > on here. > > > > Yes. I used &255 because it's does clutter as much. > > OK. I'm going to change to a cast to be on the safe side > due to the "implementation-defined and undefined aspects" > mentioned in C99 6.5p4: > > Some operators (the unary operator ~, and the binary > operators <<, >>, &, ^, and |, collectively described as > bitwise operators) are required to have operands that > have integer type. These operators yield values that > depend on the internal representations of integers, and > have implementation-defined and undefined aspects for > signed types. Sure. > > > > I think this is clearer, even though it adds a > > > conditional: > > > > > > q = q * 16 + isdigit(*r) ? *r - '0' : tolower(*r) - > > > 'a' + 10; > > > > I think > > > > if (isdigit(*r)) > > q = q * 16 + (*r - '0'); > > else > > q = q * 16 + (tolower(*r) - 'a') + 10; > > > > is clearer, or at least brackets around the ternary. > > You're right, the line was getting a bit long and > convoluted with the ternary, bad habit of mine. > > Another thought just came to mind, isoctal is a reserved > name. 7.26p1: > > The following names are grouped under individual headers > for convenience. All external names described below are > reserved no matter what headers are included by the > program. > > 7.26.2p1: > > Function names that begin with either is or to, and a > lowercase letter may be added to the declarations in the > header. > > I don't know if it's worth being anal about that, > especially as we have already limited ourselves to C99 do > we need to worry about future directions? For now I've > changed isoctal() to is_odigit() but I'm open to > discussion. Sure, I didn't think about that, however, I think we can be pretty confident that if isoctal is added, it will do that same thing as that macro, is will not make a difference. However, I think isodigit probably what such a function will be called since there already is isxdigit, so why not change it to is_odigit or ODIGIT. > > Updated patch for consideration: > > - 8< - 8< - 8< - > From 30fd43d7f3b8716054eb9867c835aadc423f652c Mon Sep 17 > 00:00:00 2001 From: =?UTF-8?q?Mattias=20Andr=C3=A9e?= > Date: Sun, 5 Feb 2017 00:44:35 +0100 > Subject: [PATCH] libutil/unescape.c: simplify and add \E > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Signed-off-by: Mattias Andrée > --- > libutil/unescape.c | 101 > ++--- 1 > file changed, 42 insertions(+), 59 deletions(-) > > diff --git a/libutil/unescape.c b/libutil/unescape.c > index d1503e6..7523db3 100644 > --- a/libutil/unescape.c > +++ b/libutil/unescape.c > @@ -1,74 +1,57 @@ > /* See LICENSE file for copyright and license details. */ > +#include > #include > > #include "../util.h" > > +#define is_odigit(c) ('0' <= c && c <= '7') > + > size_t > unescape(char *s) > { > - size_t len, i, off, m, factor, q; > - > - len = strlen(s); > + static const char escapes[256] = { > + ['"'] = '"', > + ['\''] = '\'', > + ['\\'] = '\\', > + ['a'] = '\a', > + ['b'] = '\b', > + ['E'] = 033, > + ['e'] = 033, > + ['f'] = '\f', > + ['n'] = '\n', > + ['r'] = '\r', > + ['t'] = '\t', > + ['v'] = '\v' > + }; > + size_t m, q; > + char *r, *w; > > - for (i = 0; i < len; i++) { > - if (s[i] != '\\') > + for (r = w = s; *r;) { > + if (*r != '\\') { > + *w++ = *r++; > continue; > - off = 0; > - > - switch (s[i + 1]) { > - case '\\': s[i] = '\\'; off++; break; > - case '\'': s[i] = '\'', off++; break; > - case '"': s[i] = '"', off++; break; > - case 'a': s[i] = '\a'; off++; break; > - case 'b': s[i] = '\b'; off++; break; > - case 'e': s[i] = 033; off++; break; > - case 'f': s[i] = '\f'; off++; break; > - case 'n': s[i] = '\n'; off++; break; > - case 'r': s[i] = '\r'; off++; break; > - case 't': s[i] = '\t'; off++; break; >
Re: Re: [hackers] [PATCH v3][sbase] libutil/unescape.c: simplify and add \E
Mattias Andréewrote: > On Mon, 06 Feb 2017 15:05:32 -0800 > evan.ga...@gmail.com (Evan Gates) wrote: > > > Mattias Andrée wrote: > > > > > + } else if (escapes[*r & 255]) { > > > + *w++ = escapes[*r++ & 255]; > > > > Why do you & 255 here? I think a cast to unsigned char > > would accomplish what you are trying to do and be more > > correct (as char can default to either signed or > > unsigned). Although I may misunderstand what is going on > > here. > > Yes. I used &255 because it's does clutter as much. OK. I'm going to change to a cast to be on the safe side due to the "implementation-defined and undefined aspects" mentioned in C99 6.5p4: Some operators (the unary operator ~, and the binary operators <<, >>, &, ^, and |, collectively described as bitwise operators) are required to have operands that have integer type. These operators yield values that depend on the internal representations of integers, and have implementation-defined and undefined aspects for signed types. > > I think this is clearer, even though it adds a > > conditional: > > > > q = q * 16 + isdigit(*r) ? *r - '0' : tolower(*r) - 'a' + 10; > > I think > > if (isdigit(*r)) > q = q * 16 + (*r - '0'); > else > q = q * 16 + (tolower(*r) - 'a') + 10; > > is clearer, or at least brackets around the ternary. You're right, the line was getting a bit long and convoluted with the ternary, bad habit of mine. Another thought just came to mind, isoctal is a reserved name. 7.26p1: The following names are grouped under individual headers for convenience. All external names described below are reserved no matter what headers are included by the program. 7.26.2p1: Function names that begin with either is or to, and a lowercase letter may be added to the declarations in the header. I don't know if it's worth being anal about that, especially as we have already limited ourselves to C99 do we need to worry about future directions? For now I've changed isoctal() to is_odigit() but I'm open to discussion. Updated patch for consideration: - 8< - 8< - 8< - From 30fd43d7f3b8716054eb9867c835aadc423f652c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mattias=20Andr=C3=A9e?= Date: Sun, 5 Feb 2017 00:44:35 +0100 Subject: [PATCH] libutil/unescape.c: simplify and add \E MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Mattias Andrée --- libutil/unescape.c | 101 ++--- 1 file changed, 42 insertions(+), 59 deletions(-) diff --git a/libutil/unescape.c b/libutil/unescape.c index d1503e6..7523db3 100644 --- a/libutil/unescape.c +++ b/libutil/unescape.c @@ -1,74 +1,57 @@ /* See LICENSE file for copyright and license details. */ +#include #include #include "../util.h" +#define is_odigit(c) ('0' <= c && c <= '7') + size_t unescape(char *s) { - size_t len, i, off, m, factor, q; - - len = strlen(s); + static const char escapes[256] = { + ['"'] = '"', + ['\''] = '\'', + ['\\'] = '\\', + ['a'] = '\a', + ['b'] = '\b', + ['E'] = 033, + ['e'] = 033, + ['f'] = '\f', + ['n'] = '\n', + ['r'] = '\r', + ['t'] = '\t', + ['v'] = '\v' + }; + size_t m, q; + char *r, *w; - for (i = 0; i < len; i++) { - if (s[i] != '\\') + for (r = w = s; *r;) { + if (*r != '\\') { + *w++ = *r++; continue; - off = 0; - - switch (s[i + 1]) { - case '\\': s[i] = '\\'; off++; break; - case '\'': s[i] = '\'', off++; break; - case '"': s[i] = '"', off++; break; - case 'a': s[i] = '\a'; off++; break; - case 'b': s[i] = '\b'; off++; break; - case 'e': s[i] = 033; off++; break; - case 'f': s[i] = '\f'; off++; break; - case 'n': s[i] = '\n'; off++; break; - case 'r': s[i] = '\r'; off++; break; - case 't': s[i] = '\t'; off++; break; - case 'v': s[i] = '\v'; off++; break; - case 'x': - /* "\xH[H]" hexadecimal escape */ - for (m = i + 2; m < i + 1 + 3 && m < len; m++) - if ((s[m] < '0' && s[m] > '9') && - (s[m] < 'A' && s[m] > 'F') && - (s[m] < 'a' && s[m] > 'f')) - break; - if (m == i + 2) - eprintf("invalid escape sequence '\\%c'\n", s[i + 1]); - off += m - i - 1; -
Re: [hackers] [PATCH v3][sbase] libutil/unescape.c: simplify and add \E
On Mon, 06 Feb 2017 15:05:32 -0800 evan.ga...@gmail.com (Evan Gates) wrote: > Mattias Andréewrote: > > > + } else if (escapes[*r & 255]) { > > + *w++ = escapes[*r++ & 255]; > > Why do you & 255 here? I think a cast to unsigned char > would accomplish what you are trying to do and be more > correct (as char can default to either signed or > unsigned). Although I may misunderstand what is going on > here. Yes. I used &255 because it's does clutter as much. > > > + q = q * 8 + (*r & 7); > > I think this is clearer: > > q = q * 8 + (*r - '0'); Go for it! > > > + *w++ = q > 255 ? 255 : q; > > Probably use MIN(q, 255) instead of the explicit ternary. Yeah, I forgot that we have MIN macro, so I didn't change that part. > > > + for (q = 0, m = 2; m && > > isxdigit(*r); m--, r++) > > + q = q * 16 + (*r & 15) > > + 9 * !!isalpha(*r); > > I think this is clearer, even though it adds a > conditional: > > q = q * 16 + isdigit(*r) ? *r - '0' : tolower(*r) - 'a' + > 10; I think if (isdigit(*r)) q = q * 16 + (*r - '0'); else q = q * 16 + (tolower(*r) - 'a') + 10; is clearer, or at least brackets around the ternary. > > Great work, much much simpler implementation. Once I > understand the &255 and we agree on the best solution for > these few lines I'll go ahead and push this. pgp_aIrTP5Ed3.pgp Description: OpenPGP digital signature
Re: [hackers] [PATCH v3][sbase] libutil/unescape.c: simplify and add \E
Mattias Andréewrote: > + } else if (escapes[*r & 255]) { > + *w++ = escapes[*r++ & 255]; Why do you & 255 here? I think a cast to unsigned char would accomplish what you are trying to do and be more correct (as char can default to either signed or unsigned). Although I may misunderstand what is going on here. > + q = q * 8 + (*r & 7); I think this is clearer: q = q * 8 + (*r - '0'); > + *w++ = q > 255 ? 255 : q; Probably use MIN(q, 255) instead of the explicit ternary. > + for (q = 0, m = 2; m && isxdigit(*r); m--, r++) > + q = q * 16 + (*r & 15) + 9 * !!isalpha(*r); I think this is clearer, even though it adds a conditional: q = q * 16 + isdigit(*r) ? *r - '0' : tolower(*r) - 'a' + 10; Great work, much much simpler implementation. Once I understand the &255 and we agree on the best solution for these few lines I'll go ahead and push this.