And of course I still had a potential buffer overflow in there...
On Sat, 2020-06-13 at 09:16 +0200, Martijn van Duren wrote:
> Minor change: I forgot to forward the display_hint flag to
> smi_displayhint_os. Now -OQ and -Oq work as well.
>
> On Thu, 2020-06-11 at 21:17 +0200, Martijn van Duren wrote:
> > Hello Ingo,
> >
> > Thanks for looking into this.
> >
> > On Sun, 2020-05-31 at 16:32 +0200, Ingo Schwarze wrote:
> > > Hi Martijn,
> > >
> > > Martijn van Duren wrote on Tue, May 19, 2020 at 10:25:37PM +0200:
> > >
> > > > So according to RFC2579 an octetstring can contain UTF-8 characters if
> > > > so described in the DISPLAY-HINT. One of the main consumers of this is
> > > > SnmpAdminString, which is used quite a lot.
> > > >
> > > > Now that we trimmed a little fat from snmp's oid, I wanted to fill it
> > > > up again and implemented the bare minimum DISPLAY-HINT parsing that is
> > > > required for SnmpAdminString. Other parts of the syntax can be atter
> > > > at a later state if desirable.
> > > >
> > > > Now I decided to implement this a little different from net-snmp,
> > > > because they throw incomplete sequences directly to the terminal,
> > > > which I recon is not a sane approach.
> > >
> > > No, definitely not. Never throw invalid or incomplete UTF-8 at the
> > > terminal unless the user unambiguously requests just that (like
> > > with `printf "a\x80z\n"` or `cat /usr/bin/cc` or something like that).
> > >
> > > > I choose to take the approach taken by the likes of wall(1) and replace
> > > > invalid and incomplete sequences with a '?'.
> > >
> > > If the specification says that the user is only allowed to put valid
> > > UTF-8 into octetstrings (and not arbitrary bytes), replacing invalid
> > > bytes at output time is both acceptable and feels like the only
> > > sensible option, unless something like vis(3) is considered more
> > > appropriate.
> >
> > RFC2579 section 3.1 bullet (3): or `t' for UTF-8
> > So it should be valid UTF-8.
> > > However, the reason why wall(1) uses '?' as the replacement character
> > > is because wall(1) must not assume that the output device can handle
> > > UTF-8 and has to work safely if all the output device can handle is
> > > ASCII. So, wall(1) has to live with the inconvenience that when you
> > > see a question mark, you don't know whether it really is a question
> > > mark or whether it originally used to be garbage input.
> >
> > Fair enough.
> > > If i understand correctly, here we are talking about output that is
> > > UTF-8 anyway. So instead of (ab)using the question mark, you might
> > > consider using the U+FFFD REPLACEMENT CHARACTER. The intended
> > > purpose of that one is to stand in for invalid and inconplete
> > > byte sequences, and also for characters that cannot be displayed.
> >
> > I like that one, used in diff below.
> > > > This because DISPLAY-HINT already states that too long sequences
> > > > strings should be cut short at the final multibyte character fitting
> > > > inside the specified length, meaning that output can already get
> > > > mangled.
> > > >
> > > > This also brings me to the question how we should handle DISPLAY-HINT
> > > > in the case of -Oa and -Ox. Net-snmp prioritizes DISPLAY-HINT over
> > > > these, but it has the option to disable this by disabling MIB-loading
> > > > via -m''; This implementation doesn't give us the that option (yet?).
> > > > The current diff follows net-snmp, but there is something to say to
> > > > prioritize -Ox over DISPLAY-HINT, so an admin can use it to debug
> > > > snmp-output without it being mangled. Any feedback here is welcome.
> > > >
> > > > Once it's clear if this is the right approach I'll do a thorough search
> > > > through mib.h on which objects actually can use this new definition.
> > >
> > > I can't really comment on what snmp(1) should do, or which settings
> > > should override which other settings.
> >
> > I'll leave it as is for now. If people actually need to original mangled
> > data we can always revisit. It's not something major that people heavily
> > rely on.
> > > Do i understand correctly that you want to make -Oa print invalid
> > > UTF-8 to the terminal?
> >
> > No, to be in line with net-snmp, what snmp(1) does is the following:
> > - Without a -O flag it sees if all bytes are printable and if so print
> > the string, if not prints the full string in hex.
> > - With -Oa it prints all printable bytes and replaces unprintable
> > characters with '.'.
> > - With -Ox it prints all bytes in hex.
> >
> > > If so, that would sound reasonable to me,
> > > for the following reason. Printing non-printable ASCII to the
> > > terminal is often dangerous, it can screw up or reconfigure the
> > > terminal, so -Oa is already an option that must be used with care,
> > > just like `cat /usr/bin/cc`. While printing invalid UTF-8 is not
> > > a good idea in general, it is less dangerous than printing non-printable
> > > ASCII, so just passing the bytes through for -Oa doesn't make
> > > anything more dangerous than it already is.
> > >
> > > Maybe -Oa should have a warning in the manual page about the dangers
> > > of using it on untrusted data? Not sure.
> >
> > So no, it's not dangerous.
> > > Oh, by the way, when -Oa is not specified and there is UTF-8
> > > content, don't forget to still filter out non-printable bytes:
> > > both non-printable ASCII bytes and non-printable UTF-8 characters.
> > > Right?
> >
> > You're right. Fixed in new diff.
> > > > Index: smi.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/usr.bin/snmp/smi.c,v
> > > [...]
> > > > @@ -598,6 +638,60 @@ smi_foreach(struct oid *oid)
> > > > }
> > > >
> > > > int
> > > > +isu8cont(unsigned char c)
> > > > +{
> > > > + return (c & (0x80 | 0x40)) == 0x80;
> > > > +}
> > > > +
> > > > +char *
> > > > +smi_displayhint_os(struct textconv *tc, const char *buf, size_t buflen)
> > > > +{
> > > > + size_t octetlength, i, j, start;
> > > > + char *displayformat;
> > > > + char *rbuf;
> > > > + mbstate_t mbs;
> > > > +
> > > > + errno = 0;
> > > > + octetlength = (size_t) strtol(tc->tc_display_hint,
> > > > &displayformat, 10);
> > > > + if (octetlength == 0 && errno != 0) {
> > > > + errno = EINVAL;
> > > > + return NULL;
> > > > + }
> > > > + if (displayformat[0] == 't') {
> > > > + rbuf = malloc(octetlength + 1);
> > >
> > > NULL pointer access ahead on ENOMEM.
> >
> > fixed
> > > > + if (strcmp(nl_langinfo(CODESET), "UTF-8") == 0) {
> > >
> > > On OpenBSD, that is not the normal way of checking whether
> > > the locale is UTF-8. Also, it is not portable because the CODESET
> > > values are not standardized.
> >
> > I took this one from tmux(1).
> > > We usually do
> > >
> > > #include <stdlib.h>
> > >
> > > if (MB_CUR_MAX > 1) {
> > > /* UTF-8 here */
> > >
> > > Yes, that means if you want to do a portable version, quite a bit
> > > of additional work will be needed. But the same holds when using
> > > CODESET.
> >
> > Changed
> > > > + bzero(&mbs, sizeof(mbs));
> > > > + for (start = j = i = 0; i < octetlength && i <
> > > > buflen; i++) {
> > > > + switch (mbrtowc(NULL, &(buf[i]), 1,
> > > > &mbs)) {
> > >
> > > Why are you using mbrtowc(3) here? That function is unnecessarily
> > > complicated and should only be used when absolutely necessary, i.e.
> > > in the following two situations: (1) the function has to fail on
> > > invalid sequences but must return incomplete sequences, because more
> > > data may arrive later, then making the sequence complete OR (2) the
> > > program is multi-threaded and more than one thread might do UTF-8
> > > decoding at the same time. Neither seems to be the case here.
> > >
> > > Using mbtowc(3) is almost always preferable to using mbrtowc(3)
> > > because the resulting code is much easier to read.
> >
> > I thought I was being lazy by reading one byte at a time. :-)
> > Changed.
> > > I'm delaying full review of the following bits until this is switched
> > > to mbtowc(3).
> > >
> > > > + case 0:
> > > > + rbuf[j++] = '\0';
> > > > + break;
> > > > + case -1:
> > > > + rbuf[j++] = '?';
> > >
> > > You might want U+FFFD here, see above.
> >
> > Done. Minor nit might be that we exchange a single byte for 3, where we
> > only have octetlength return buffer. But since we expect decent output
> > and the spec says we can trim from the end there's no big harm.
> > > > + bzero(&mbs, sizeof(mbs));
> > > > + break;
> > > > + case -2:
> > > > + break;
> > > > + default:
> > >
> > > It looks as if the printability check might be missing here.
> > > Don't you need iswprint(3) before printing to the terminal,
> > > and print U+FFFD instead if it fails?
> >
> > I choose to go with '.' here to be in line with what -Oa does.
> > Don't know if it's better, but it feels more consistent. :-)
> > > > + memcpy(&(rbuf[j]),
> > > > &(buf[start]), i - start + 1);
> > > > + j += i - start + 1;
> > > > + start = i + 1;
> > > > + break;
> > > > + }
> > > > + }
> > > > + } else {
> > > > + for (j = i = 0; i < octetlength && i < buflen;
> > > > i++) {
> > > > + if (isu8cont(buf[i]))
> > > > + continue;
> > >
> > > This is weird.
> > >
> > > So if the user sets LC_CTYPE=C, then UTF-8 continuation bytes are
> > > silently skipped? Consider input like
> > >
> > > printf "a\x80z"
> > >
> > > That printing as "az" would seem unacceptable to me, shouldn't it
> > > rather print "a?z"?
> > >
> > > > + rbuf[j++] = isprint(buf[i]) ? buf[i] :
> > > > '?';
> > >
> > > That statement will handle bytes with the high bit set as non-printable
> > > anyway, so i see no need for the isu8cont() test before it.
> > >
> > > Also, why do you handle the ASCII case separately from the UTF-8
> > > case at all? A typical mbtowc(3) loop works just fine for ASCII,
> > > too. In rare cases, there may be a performance gain from handling
> > > ASCII seperately without mbtowc(3), but i don't assume that's the
> > > case here.
> > >
> > > In case you prefer printing "a?z" for a valid two-byte UTF-8 character
> > > in ASCII output mode rather than "a??z", you can do the following:
> > > inside the mbtowc(3) loop, when finding an invalid sequence, check
> > > whether you are in ASCII mode (with MB_CUR_MAX == 0) and if so,
> > > skip all subsequent isu8cont() bytes. But that's just a bit of
> > > icing on the cake, one might argue it's a matter of taste whether
> > > printing "a?z" or "a??z" is better.
> >
> > First: I prefer a single '?' because it gives the user a better
> > indication of what the intended character was. Similar to the discussion
> > had on wall(1). That being said I'd like to stay a little more true to
> > what -Oa does. This means that I now make a best effort guess for UTF-8
> > characters (assuming input should be valid UTF-8, this seems reasonable
> > to me). Full UTF-8 characters are now printed as '?' and similar to -Oa
> > !isprint characters are returned as '.'.
> >
> > As before I haven't added any more snmpadminstrings, but syscontact
> > should be enough for testing.
> > > Yours,
> > > Ingo
>
Index: mib.c
===================================================================
RCS file: /cvs/src/usr.bin/snmp/mib.c,v
retrieving revision 1.2
diff -u -p -r1.2 mib.c
--- mib.c 19 May 2020 13:41:01 -0000 1.2
+++ mib.c 13 Jun 2020 08:10:39 -0000
@@ -27,9 +27,11 @@
#include "smi.h"
static struct oid mib_tree[] = MIB_TREE;
+static struct textconv textconv_tree[] = TEXTCONV_TREE;
void
mib_init(void)
{
smi_mibtree(mib_tree);
+ smi_textconvtree(textconv_tree);
}
Index: mib.h
===================================================================
RCS file: /cvs/src/usr.bin/snmp/mib.h,v
retrieving revision 1.1
diff -u -p -r1.1 mib.h
--- mib.h 9 Aug 2019 06:17:59 -0000 1.1
+++ mib.h 13 Jun 2020 08:10:39 -0000
@@ -751,7 +751,7 @@
{ MIBDECL(sysDescr) }, \
{ MIBDECL(sysOID) }, \
{ MIBDECL(sysUpTime) }, \
- { MIBDECL(sysContact) }, \
+ { MIBDECL(sysContact), "SnmpAdminString" }, \
{ MIBDECL(sysName) }, \
{ MIBDECL(sysLocation) }, \
{ MIBDECL(sysServices) }, \
@@ -1345,6 +1345,11 @@
{ MIBDECL(ipfRouteEntRouteMetric5) }, \
{ MIBDECL(ipfRouteEntStatus) }, \
{ MIBEND } \
+}
+
+#define TEXTCONV_TREE { \
+ { "SnmpAdminString", "255t", BER_TYPE_OCTETSTRING }, \
+ { NULL, NULL } \
}
void mib_init(void);
Index: smi.c
===================================================================
RCS file: /cvs/src/usr.bin/snmp/smi.c,v
retrieving revision 1.9
diff -u -p -r1.9 smi.c
--- smi.c 31 May 2020 20:38:28 -0000 1.9
+++ smi.c 13 Jun 2020 08:10:39 -0000
@@ -24,10 +24,13 @@
#include <arpa/inet.h>
#include <ctype.h>
+#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
+#include <wchar.h>
+#include <wctype.h>
#include "ber.h"
#include "mib.h"
@@ -36,8 +39,13 @@
#define MINIMUM(a, b) (((a) < (b)) ? (a) : (b))
+int isu8cont(unsigned char);
+int isu8start(unsigned char);
+char *smi_displayhint_os(struct textconv *, int, const char *, size_t);
+
int smi_oid_cmp(struct oid *, struct oid *);
int smi_key_cmp(struct oid *, struct oid *);
+int smi_textconv_cmp(struct textconv *, struct textconv *);
struct oid * smi_findkey(char *);
RB_HEAD(oidtree, oid);
@@ -48,6 +56,10 @@ RB_HEAD(keytree, oid);
RB_PROTOTYPE(keytree, oid, o_keyword, smi_key_cmp)
struct keytree smi_keytree;
+RB_HEAD(textconvtree, textconv);
+RB_PROTOTYPE(textconvtree, textconv, tc_entry, smi_textconv_cmp);
+struct textconvtree smi_tctree;
+
int
smi_init(void)
{
@@ -181,7 +193,7 @@ smi_debug_elements(struct ber_element *r
fprintf(stderr, "(%u) encoding %u ",
root->be_type, root->be_encoding);
- if ((value = smi_print_element(root, 1, smi_os_default,
+ if ((value = smi_print_element(NULL, root, 1, smi_os_default,
smi_oidl_numeric)) == NULL)
goto invalid;
@@ -236,10 +248,13 @@ smi_debug_elements(struct ber_element *r
}
char *
-smi_print_element(struct ber_element *root, int print_hint,
+smi_print_element(struct ber_oid *oid, struct ber_element *root, int
print_hint,
enum smi_output_string output_string, enum smi_oid_lookup lookup)
{
char *str = NULL, *buf, *p;
+ struct oid okey;
+ struct oid *object = NULL;
+ struct textconv tckey;
size_t len, i, slen;
long long v, ticks;
int d;
@@ -249,6 +264,20 @@ smi_print_element(struct ber_element *ro
char *hint;
int days, hours, min, sec, csec;
+ if (oid != NULL) {
+ bcopy(oid, &(okey.o_id), sizeof(okey));
+ do {
+ object = RB_FIND(oidtree, &smi_oidtree, &okey);
+ okey.o_id.bo_n--;
+ } while (object == NULL && okey.o_id.bo_n > 0);
+ if (object != NULL && object->o_textconv == NULL &&
+ object->o_tcname != NULL) {
+ tckey.tc_name = object->o_tcname;
+ object->o_textconv = RB_FIND(textconvtree, &smi_tctree,
+ &tckey);
+ }
+ }
+
switch (root->be_encoding) {
case BER_TYPE_BOOLEAN:
if (ober_get_boolean(root, &d) == -1)
@@ -379,6 +408,10 @@ smi_print_element(struct ber_element *ro
else
str = strdup("Unknown status at this OID");
} else {
+ if (object != NULL && object->o_textconv != NULL &&
+ object->o_textconv->tc_syntax == root->be_encoding)
+ return smi_displayhint_os(object->o_textconv,
+ print_hint, buf, root->be_len);
for (i = 0; i < root->be_len; i++) {
if (!isprint(buf[i])) {
if (output_string == smi_os_default)
@@ -575,6 +608,15 @@ smi_mibtree(struct oid *oids)
}
}
+void
+smi_textconvtree(struct textconv *textconvs)
+{
+ size_t i = 0;
+
+ for (i = 0; textconvs[i].tc_name != NULL; i++)
+ RB_INSERT(textconvtree, &smi_tctree, &(textconvs[i]));
+}
+
struct oid *
smi_findkey(char *name)
{
@@ -598,6 +640,105 @@ smi_foreach(struct oid *oid)
}
int
+isu8cont(unsigned char c)
+{
+ return (c & (0x80 | 0x40)) == 0x80;
+}
+
+int
+isu8start(unsigned char c)
+{
+ return (c & 0xc0) == 0xc0;
+}
+
+#define REPLACEMENT "\357\277\275"
+#define U8MAX 4
+char *
+smi_displayhint_os(struct textconv *tc, int print_hint, const char *buf,
+ size_t buflen)
+{
+ size_t octetlength, i, j = 0, u8cnt = U8MAX;
+ int clen;
+ char *displayformat;
+ char *rbuf;
+ wchar_t wc;
+
+ errno = 0;
+ octetlength = (size_t) strtol(tc->tc_display_hint, &displayformat, 10);
+ if (octetlength == 0 && errno != 0) {
+ errno = EINVAL;
+ return NULL;
+ }
+ if (displayformat[0] == 't') {
+ rbuf = malloc(octetlength + sizeof("STRING: "));
+ if (rbuf == NULL)
+ return NULL;
+ if (print_hint) {
+ memcpy(rbuf, "STRING: ", sizeof("STRING: ") - 1);
+ j = sizeof("STRING: ") - 1;
+ }
+ if (MB_CUR_MAX > 1) {
+ for (i = 0; j < octetlength && i < buflen;) {
+ clen = mbtowc(&wc, &(buf[i]), buflen - i);
+ switch (clen) {
+ case 0:
+ rbuf[j] = '\0';
+ return rbuf;
+ case -1:
+ if (octetlength - j <
+ sizeof(REPLACEMENT) - 1) {
+ rbuf[j] = '\0';
+ return rbuf;
+ }
+ memcpy(&(rbuf[j]), REPLACEMENT,
+ sizeof(REPLACEMENT) - 1);
+ j += sizeof(REPLACEMENT) - 1;
+ i++;
+ break;
+ default:
+ if (!iswprint(wc))
+ rbuf[j++] = '.';
+ else {
+ if (octetlength - j < clen) {
+ rbuf[j] = '\0';
+ return rbuf;
+ }
+ memcpy(&(rbuf[j]), &(buf[i]),
+ clen);
+ j += clen;
+ }
+ i += clen;
+ break;
+ }
+ }
+ } else {
+ for (j = i = 0; i < octetlength && i < buflen; i++) {
+ if (isu8start(buf[i])) {
+ u8cnt = 0;
+ continue;
+ }
+ if (isu8cont(buf[i]) && u8cnt < U8MAX) {
+ if (u8cnt == 0)
+ rbuf[j++] = '?';
+ u8cnt++;
+ continue;
+ }
+ if (u8cnt == 0) {
+ u8cnt = U8MAX;
+ rbuf[j++] = '.';
+ }
+ rbuf[j++] = isprint(buf[i]) ? buf[i] : '.';
+ }
+ if (u8cnt == 0)
+ rbuf[j++] = '.';
+ }
+ rbuf[j] = '\0';
+ return rbuf;
+ }
+ return NULL;
+}
+
+int
smi_oid_cmp(struct oid *a, struct oid *b)
{
size_t i;
@@ -618,5 +759,12 @@ smi_key_cmp(struct oid *a, struct oid *b
return (strcasecmp(a->o_name, b->o_name));
}
+int
+smi_textconv_cmp(struct textconv *a, struct textconv *b)
+{
+ return strcmp(a->tc_name, b->tc_name);
+}
+
RB_GENERATE(oidtree, oid, o_element, smi_oid_cmp)
RB_GENERATE(keytree, oid, o_keyword, smi_key_cmp)
+RB_GENERATE(textconvtree, textconv, tc_entry, smi_textconv_cmp);
Index: smi.h
===================================================================
RCS file: /cvs/src/usr.bin/snmp/smi.h,v
retrieving revision 1.2
diff -u -p -r1.2 smi.h
--- smi.h 19 May 2020 13:41:01 -0000 1.2
+++ smi.h 13 Jun 2020 08:10:39 -0000
@@ -59,18 +59,28 @@ struct oid {
#define o_oid o_id.bo_id
#define o_oidlen o_id.bo_n
- char *o_name;
+ const char *o_name;
+ const char *o_tcname;
+ struct textconv *o_textconv;
RB_ENTRY(oid) o_element;
RB_ENTRY(oid) o_keyword;
};
+struct textconv {
+ const char *tc_name;
+ const char *tc_display_hint;
+ unsigned int tc_syntax;
+ RB_ENTRY(textconv) tc_entry;
+};
+
int smi_init(void);
unsigned int smi_application(struct ber_element *);
int smi_string2oid(const char *, struct ber_oid *);
char *smi_oid2string(struct ber_oid *, char *, size_t, enum smi_oid_lookup);
void smi_mibtree(struct oid *);
+void smi_textconvtree(struct textconv *);
struct oid *smi_foreach(struct oid *);
void smi_debug_elements(struct ber_element *);
-char *smi_print_element(struct ber_element *, int, enum smi_output_string,
- enum smi_oid_lookup);
+char *smi_print_element(struct ber_oid *, struct ber_element *, int,
+ enum smi_output_string, enum smi_oid_lookup);
Index: snmpc.c
===================================================================
RCS file: /cvs/src/usr.bin/snmp/snmpc.c,v
retrieving revision 1.26
diff -u -p -r1.26 snmpc.c
--- snmpc.c 31 May 2020 21:01:59 -0000 1.26
+++ snmpc.c 13 Jun 2020 08:10:39 -0000
@@ -29,6 +29,7 @@
#include <ctype.h>
#include <err.h>
#include <errno.h>
+#include <locale.h>
#include <netdb.h>
#include <poll.h>
#include <stdio.h>
@@ -129,6 +130,8 @@ main(int argc, char *argv[])
int ch;
size_t i;
+ setlocale(LC_CTYPE, "");
+
if (pledge("stdio inet dns unix", NULL) == -1)
err(1, "pledge");
@@ -1069,7 +1072,8 @@ snmpc_print(struct ber_element *elm)
}
elm = elm->be_next;
- value = smi_print_element(elm, smi_print_hint, output_string,
oid_lookup);
+ value = smi_print_element(&oid, elm, smi_print_hint, output_string,
+ oid_lookup);
if (value == NULL)
return 0;