Re: snmp(1) initial UTF-8 support

2020-07-03 Thread Ingo Schwarze
Hi Martijn,

sorry for the delay, now i finally looked at the function
smi_displayhint_os() from the line "if (MB_CUR_MAX > 1) {" to the
end of the corresponding else clause.  IIUC, what that code is
trying to do is iterate the input buffer "const char *buf" of length
"size_t buflen".

Before starting, i will consider the sizing of rbuf.  It is octetlength
plus strlen("STRING: ") plus 1 byte for the terminating NUL.  Then
optionally, "STRING: " is copied into it, so octetlength plus one
byte for the terminating NUL remains.

However, if the "STRING: " is copied in, j is intialized to
strlen("STRING: ") and then later compared to octetlength which
remains unchanged.  That feels odd.  I would expect that if j starts
at strlen("STRING: "), octetlength should be increased accordingly,
or alternatively, if octetlength remains as it is, j should always
start at 0.  Otherwise, the last eight bytes allocated always seem
to remain unused.  Is the octetlength input parameter supposed to
include strlen("STRING: ") or not?  [1]

What the else clause does (i.e. when the user selected the C locale)
seems to be this:

 * j is unconditionally re-initialized to 0, so if "STRING: "
   was written earlier, that gets overwritten.  Is that
   intentional?  [2]
 * UTF-8 starting bytes followed by another UTF-8 starting byte
   are totally ignored.  Is that intentional?  If feels odd.  [3]
 * UTF-8 starting bytes followed by an ascii byte
   are represented as ".".  That seems intentional.
 * UTF-8 starting bytes followed by at least one UTF-8 cont byte
   are represented by "?".  Up to FOUR cont bytes are totally
   ignored, which seems quite odd because an UTF-8 sequence
   can have at most THREE cont bytes (four bytes total including
   the starting byte).  [4]  From the FIFTH following cont byte
   onwards, "." is written for each cont byte.
   Note that this is *not* trying to check whether the UTF-8 sequence
   is valid (which wouldn't be trivial at all), but probably that
   isn't required.  It merely tries to check whether the sequence
   is longer than the absolute maximum (and seems to allow one byte
   too much unless i misread the code).
 * A sequence "start + cont + ASCII + cont" prints "?a",
   then totally ignores the second cont byte as if it were
   a continuation of the initial start byte, despite the
   intervening ASCII byte.  [5]
   The state machine doesn't appear to be fully thought out.
   Did you prepare a list for yourself containing all the
   possible states and all the edges of the machine?
 * An initial cont byte, or one after start + ASCII, or one after
   start + at least 4 cont + ASCII is represented by ".";
   that seems intentional.
 * In the for loop in the else clause, the "i < octetlength" feels
   confusing to me.  It cannot cause an overflow because in the
   else clause, we always have j < i (caveat, this might only be
   due to the possibly unintentional re-initializing of j to 0).
   But assume an input buf containing lots of UTF-8 characters.  In
   that case, you abort processing long before the output rbuf is
   full.  Is that intentional?  Either way, wouldn't "j < octetlength"
   be both clearer and safer?  [6]
 * The input "UTF-8 start + ASCII, end of input" is represented
   as "x." rather than ".x".  [7]

So the idea probably is to print

 * "." for clearly invalid bytes including non-printable ASCII bytes
 * "?" for UTF-8 start bytes including following cont bytes that
   _might_ form a valid UTF-8 sequence, accepting invalid sequences
   that would be non-trivial to identify.

But it doesn't quite do that.

For the "if (MB_CUR_MAX > 1)" clause (i.e., when the user selected
a UTF-8 lcale), it seems to do this:

 * Embedded NUL bytes terminate processing, returning what was
   decoded so far.  Note that differs from what the ASCII clause
   does, which represents NUL as "." and continues processing.
   I don't think it can be intentional that processing stops
   at different places depending on the user's locale.  [8]
 * The "case -1" clause obviously lacks a call
   (void)mbtowc(NULL, NULL, MB_CUR_MAX);
   see the mbtowc(3) manual page,
   /usr/src/usr.bin/fmt/fmt.c for simple examples, or
   /usr/src/usr.bin/ssh/utf8.c for a full-blown example.  [9]
 * In contrast to the ASCII clause, the UTF-8 clause does detect
   all invalid UTF-8 sequences and writes a replacement character
   for each invalid byte (rather than a replacement character
   for groups of invalid bytes as the ASCII clause does).
   I think this difference in behaviour makes sense because
   the UTF-8 clause can easily detect such conditions while
   the ASCII clause should better not try.
 * For non-printable UTF-8 characters, it prints ".".
   This feels inconsistent.  We have:

   v-- input ASCII   UTF-8  <-  locale
   invalid byte  "." REPLACEMENT
   non-printable char"." "."
   UTF-8 printable char  "?" itself
   ASCII printable char  itself  itself

   So judging from 

Re: snmp(1) initial UTF-8 support

2020-06-13 Thread Martijn van Duren
And of course I still had a potential buffer overflow in there...

On Sat, 2020-06-13 at 09:16 +0200, Martijn van Duren wrote:
> Minor change: I forgot to forward the display_hint flag to
> smi_displayhint_os. Now -OQ and -Oq work as well.
> 
> On Thu, 2020-06-11 at 21:17 +0200, Martijn van Duren wrote:
> > Hello Ingo,
> > 
> > Thanks for looking into this.
> > 
> > On Sun, 2020-05-31 at 16:32 +0200, Ingo Schwarze wrote:
> > > Hi Martijn,
> > > 
> > > Martijn van Duren wrote on Tue, May 19, 2020 at 10:25:37PM +0200:
> > > 
> > > > So according to RFC2579 an octetstring can contain UTF-8 characters if  
> > > > so described in the DISPLAY-HINT. One of the main consumers of this is
> > > > SnmpAdminString, which is used quite a lot.
> > > > 
> > > > Now that we trimmed a little fat from snmp's oid, I wanted to fill it
> > > > up again and implemented the bare minimum DISPLAY-HINT parsing that is
> > > > required for SnmpAdminString. Other parts of the syntax can be atter
> > > > at a later state if desirable.
> > > > 
> > > > Now I decided to implement this a little different from net-snmp,
> > > > because they throw incomplete sequences directly to the terminal,
> > > > which I recon is not a sane approach.
> > > 
> > > No, definitely not.  Never throw invalid or incomplete UTF-8 at the
> > > terminal unless the user unambiguously requests just that (like
> > > with `printf "a\x80z\n"` or `cat /usr/bin/cc` or something like that).
> > > 
> > > > I choose to take the approach taken by the likes of wall(1) and replace
> > > > invalid and incomplete sequences with a '?'.
> > > 
> > > If the specification says that the user is only allowed to put valid
> > > UTF-8 into octetstrings (and not arbitrary bytes), replacing invalid
> > > bytes at output time is both acceptable and feels like the only
> > > sensible option, unless something like vis(3) is considered more
> > > appropriate.
> > 
> > RFC2579 section 3.1 bullet (3): or `t' for UTF-8
> > So it should be valid UTF-8.
> > > However, the reason why wall(1) uses '?' as the replacement character
> > > is because wall(1) must not assume that the output device can handle
> > > UTF-8 and has to work safely if all the output device can handle is
> > > ASCII.  So, wall(1) has to live with the inconvenience that when you
> > > see a question mark, you don't know whether it really is a question
> > > mark or whether it originally used to be garbage input.
> > 
> > Fair enough.
> > > If i understand correctly, here we are talking about output that is
> > > UTF-8 anyway.  So instead of (ab)using the question mark, you might
> > > consider using the U+FFFD REPLACEMENT CHARACTER.  The intended
> > > purpose of that one is to stand in for invalid and inconplete
> > > byte sequences, and also for characters that cannot be displayed.
> > 
> > I like that one, used in diff below.
> > > > This because DISPLAY-HINT already states that too long sequences
> > > > strings should be cut short at the final multibyte character fitting
> > > > inside the specified length, meaning that output can already get
> > > > mangled.
> > > > 
> > > > This also brings me to the question how we should handle DISPLAY-HINT
> > > > in the case of -Oa and -Ox. Net-snmp prioritizes DISPLAY-HINT over
> > > > these, but it has the option to disable this by disabling MIB-loading
> > > > via -m''; This implementation doesn't give us the that option (yet?).
> > > > The current diff follows net-snmp, but there is something to say to
> > > > prioritize -Ox over DISPLAY-HINT, so an admin can use it to debug
> > > > snmp-output without it being mangled. Any feedback here is welcome.
> > > > 
> > > > Once it's clear if this is the right approach I'll do a thorough search
> > > > through mib.h on which objects actually can use this new definition.
> > > 
> > > I can't really comment on what snmp(1) should do, or which settings
> > > should override which other settings.
> > 
> > I'll leave it as is for now. If people actually need to original mangled
> > data we can always revisit. It's not something major that people heavily
> > rely on.
> > > Do i understand correctly that you want to make -Oa print invalid
> > > UTF-8 to the terminal?
> > 
> > No, to be in line with net-snmp, what snmp(1) does is the following:
> > - Without a -O flag it sees if all bytes are printable and if so print
> >   the string, if not prints the full string in hex.
> > - With -Oa it prints all printable bytes and replaces unprintable
> >   characters with '.'.
> > - With -Ox it prints all bytes in hex.
> > 
> > > If so, that would sound reasonable to me,
> > > for the following reason.  Printing non-printable ASCII to the
> > > terminal is often dangerous, it can screw up or reconfigure the
> > > terminal, so -Oa is already an option that must be used with care,
> > > just like `cat /usr/bin/cc`.  While printing invalid UTF-8 is not
> > > a good idea in general, it is less dangerous than printing non-printable
> > > ASCII, so just 

Re: snmp(1) initial UTF-8 support

2020-06-13 Thread Martijn van Duren
Minor change: I forgot to forward the display_hint flag to
smi_displayhint_os. Now -OQ and -Oq work as well.

On Thu, 2020-06-11 at 21:17 +0200, Martijn van Duren wrote:
> Hello Ingo,
> 
> Thanks for looking into this.
> 
> On Sun, 2020-05-31 at 16:32 +0200, Ingo Schwarze wrote:
> > Hi Martijn,
> > 
> > Martijn van Duren wrote on Tue, May 19, 2020 at 10:25:37PM +0200:
> > 
> > > So according to RFC2579 an octetstring can contain UTF-8 characters if  
> > > so described in the DISPLAY-HINT. One of the main consumers of this is
> > > SnmpAdminString, which is used quite a lot.
> > > 
> > > Now that we trimmed a little fat from snmp's oid, I wanted to fill it
> > > up again and implemented the bare minimum DISPLAY-HINT parsing that is
> > > required for SnmpAdminString. Other parts of the syntax can be atter
> > > at a later state if desirable.
> > > 
> > > Now I decided to implement this a little different from net-snmp,
> > > because they throw incomplete sequences directly to the terminal,
> > > which I recon is not a sane approach.
> > 
> > No, definitely not.  Never throw invalid or incomplete UTF-8 at the
> > terminal unless the user unambiguously requests just that (like
> > with `printf "a\x80z\n"` or `cat /usr/bin/cc` or something like that).
> > 
> > > I choose to take the approach taken by the likes of wall(1) and replace
> > > invalid and incomplete sequences with a '?'.
> > 
> > If the specification says that the user is only allowed to put valid
> > UTF-8 into octetstrings (and not arbitrary bytes), replacing invalid
> > bytes at output time is both acceptable and feels like the only
> > sensible option, unless something like vis(3) is considered more
> > appropriate.
> 
> RFC2579 section 3.1 bullet (3): or `t' for UTF-8
> So it should be valid UTF-8.
> > However, the reason why wall(1) uses '?' as the replacement character
> > is because wall(1) must not assume that the output device can handle
> > UTF-8 and has to work safely if all the output device can handle is
> > ASCII.  So, wall(1) has to live with the inconvenience that when you
> > see a question mark, you don't know whether it really is a question
> > mark or whether it originally used to be garbage input.
> 
> Fair enough.
> > If i understand correctly, here we are talking about output that is
> > UTF-8 anyway.  So instead of (ab)using the question mark, you might
> > consider using the U+FFFD REPLACEMENT CHARACTER.  The intended
> > purpose of that one is to stand in for invalid and inconplete
> > byte sequences, and also for characters that cannot be displayed.
> 
> I like that one, used in diff below.
> > > This because DISPLAY-HINT already states that too long sequences
> > > strings should be cut short at the final multibyte character fitting
> > > inside the specified length, meaning that output can already get
> > > mangled.
> > > 
> > > This also brings me to the question how we should handle DISPLAY-HINT
> > > in the case of -Oa and -Ox. Net-snmp prioritizes DISPLAY-HINT over
> > > these, but it has the option to disable this by disabling MIB-loading
> > > via -m''; This implementation doesn't give us the that option (yet?).
> > > The current diff follows net-snmp, but there is something to say to
> > > prioritize -Ox over DISPLAY-HINT, so an admin can use it to debug
> > > snmp-output without it being mangled. Any feedback here is welcome.
> > > 
> > > Once it's clear if this is the right approach I'll do a thorough search
> > > through mib.h on which objects actually can use this new definition.
> > 
> > I can't really comment on what snmp(1) should do, or which settings
> > should override which other settings.
> 
> I'll leave it as is for now. If people actually need to original mangled
> data we can always revisit. It's not something major that people heavily
> rely on.
> > Do i understand correctly that you want to make -Oa print invalid
> > UTF-8 to the terminal?
> 
> No, to be in line with net-snmp, what snmp(1) does is the following:
> - Without a -O flag it sees if all bytes are printable and if so print
>   the string, if not prints the full string in hex.
> - With -Oa it prints all printable bytes and replaces unprintable
>   characters with '.'.
> - With -Ox it prints all bytes in hex.
> 
> > If so, that would sound reasonable to me,
> > for the following reason.  Printing non-printable ASCII to the
> > terminal is often dangerous, it can screw up or reconfigure the
> > terminal, so -Oa is already an option that must be used with care,
> > just like `cat /usr/bin/cc`.  While printing invalid UTF-8 is not
> > a good idea in general, it is less dangerous than printing non-printable
> > ASCII, so just passing the bytes through for -Oa doesn't make
> > anything more dangerous than it already is.
> > 
> > Maybe -Oa should have a warning in the manual page about the dangers
> > of using it on untrusted data?  Not sure.
> 
> So no, it's not dangerous.
> > Oh, by the way, when -Oa is not specified and there is UTF-8
> 

Re: snmp(1) initial UTF-8 support

2020-06-11 Thread Martijn van Duren
Hello Ingo,

Thanks for looking into this.

On Sun, 2020-05-31 at 16:32 +0200, Ingo Schwarze wrote:
> Hi Martijn,
> 
> Martijn van Duren wrote on Tue, May 19, 2020 at 10:25:37PM +0200:
> 
> > So according to RFC2579 an octetstring can contain UTF-8 characters if  
> > so described in the DISPLAY-HINT. One of the main consumers of this is
> > SnmpAdminString, which is used quite a lot.
> > 
> > Now that we trimmed a little fat from snmp's oid, I wanted to fill it
> > up again and implemented the bare minimum DISPLAY-HINT parsing that is
> > required for SnmpAdminString. Other parts of the syntax can be atter
> > at a later state if desirable.
> > 
> > Now I decided to implement this a little different from net-snmp,
> > because they throw incomplete sequences directly to the terminal,
> > which I recon is not a sane approach.
> 
> No, definitely not.  Never throw invalid or incomplete UTF-8 at the
> terminal unless the user unambiguously requests just that (like
> with `printf "a\x80z\n"` or `cat /usr/bin/cc` or something like that).
> 
> > I choose to take the approach taken by the likes of wall(1) and replace
> > invalid and incomplete sequences with a '?'.
> 
> If the specification says that the user is only allowed to put valid
> UTF-8 into octetstrings (and not arbitrary bytes), replacing invalid
> bytes at output time is both acceptable and feels like the only
> sensible option, unless something like vis(3) is considered more
> appropriate.

RFC2579 section 3.1 bullet (3): or `t' for UTF-8
So it should be valid UTF-8.
> 
> However, the reason why wall(1) uses '?' as the replacement character
> is because wall(1) must not assume that the output device can handle
> UTF-8 and has to work safely if all the output device can handle is
> ASCII.  So, wall(1) has to live with the inconvenience that when you
> see a question mark, you don't know whether it really is a question
> mark or whether it originally used to be garbage input.

Fair enough.
> 
> If i understand correctly, here we are talking about output that is
> UTF-8 anyway.  So instead of (ab)using the question mark, you might
> consider using the U+FFFD REPLACEMENT CHARACTER.  The intended
> purpose of that one is to stand in for invalid and inconplete
> byte sequences, and also for characters that cannot be displayed.

I like that one, used in diff below.
> 
> > This because DISPLAY-HINT already states that too long sequences
> > strings should be cut short at the final multibyte character fitting
> > inside the specified length, meaning that output can already get
> > mangled.
> > 
> > This also brings me to the question how we should handle DISPLAY-HINT
> > in the case of -Oa and -Ox. Net-snmp prioritizes DISPLAY-HINT over
> > these, but it has the option to disable this by disabling MIB-loading
> > via -m''; This implementation doesn't give us the that option (yet?).
> > The current diff follows net-snmp, but there is something to say to
> > prioritize -Ox over DISPLAY-HINT, so an admin can use it to debug
> > snmp-output without it being mangled. Any feedback here is welcome.
> > 
> > Once it's clear if this is the right approach I'll do a thorough search
> > through mib.h on which objects actually can use this new definition.
> 
> I can't really comment on what snmp(1) should do, or which settings
> should override which other settings.

I'll leave it as is for now. If people actually need to original mangled
data we can always revisit. It's not something major that people heavily
rely on.
> 
> Do i understand correctly that you want to make -Oa print invalid
> UTF-8 to the terminal?

No, to be in line with net-snmp, what snmp(1) does is the following:
- Without a -O flag it sees if all bytes are printable and if so print
  the string, if not prints the full string in hex.
- With -Oa it prints all printable bytes and replaces unprintable
  characters with '.'.
- With -Ox it prints all bytes in hex.

> If so, that would sound reasonable to me,
> for the following reason.  Printing non-printable ASCII to the
> terminal is often dangerous, it can screw up or reconfigure the
> terminal, so -Oa is already an option that must be used with care,
> just like `cat /usr/bin/cc`.  While printing invalid UTF-8 is not
> a good idea in general, it is less dangerous than printing non-printable
> ASCII, so just passing the bytes through for -Oa doesn't make
> anything more dangerous than it already is.
> 
> Maybe -Oa should have a warning in the manual page about the dangers
> of using it on untrusted data?  Not sure.

So no, it's not dangerous.
> 
> Oh, by the way, when -Oa is not specified and there is UTF-8
> content, don't forget to still filter out non-printable bytes:
> both non-printable ASCII bytes and non-printable UTF-8 characters.
> Right?

You're right. Fixed in new diff.
> 
> > Index: smi.c
> > ===
> > RCS file: /cvs/src/usr.bin/snmp/smi.c,v
> [...]
> > @@ -598,6 +638,60 @@ 

Re: snmp(1) initial UTF-8 support

2020-05-31 Thread Ingo Schwarze
Hi Martijn,

Martijn van Duren wrote on Tue, May 19, 2020 at 10:25:37PM +0200:

> So according to RFC2579 an octetstring can contain UTF-8 characters if  
> so described in the DISPLAY-HINT. One of the main consumers of this is
> SnmpAdminString, which is used quite a lot.
> 
> Now that we trimmed a little fat from snmp's oid, I wanted to fill it
> up again and implemented the bare minimum DISPLAY-HINT parsing that is
> required for SnmpAdminString. Other parts of the syntax can be atter
> at a later state if desirable.
> 
> Now I decided to implement this a little different from net-snmp,
> because they throw incomplete sequences directly to the terminal,
> which I recon is not a sane approach.

No, definitely not.  Never throw invalid or incomplete UTF-8 at the
terminal unless the user unambiguously requests just that (like
with `printf "a\x80z\n"` or `cat /usr/bin/cc` or something like that).

> I choose to take the approach taken by the likes of wall(1) and replace
> invalid and incomplete sequences with a '?'.

If the specification says that the user is only allowed to put valid
UTF-8 into octetstrings (and not arbitrary bytes), replacing invalid
bytes at output time is both acceptable and feels like the only
sensible option, unless something like vis(3) is considered more
appropriate.

However, the reason why wall(1) uses '?' as the replacement character
is because wall(1) must not assume that the output device can handle
UTF-8 and has to work safely if all the output device can handle is
ASCII.  So, wall(1) has to live with the inconvenience that when you
see a question mark, you don't know whether it really is a question
mark or whether it originally used to be garbage input.

If i understand correctly, here we are talking about output that is
UTF-8 anyway.  So instead of (ab)using the question mark, you might
consider using the U+FFFD REPLACEMENT CHARACTER.  The intended
purpose of that one is to stand in for invalid and inconplete
byte sequences, and also for characters that cannot be displayed.

> This because DISPLAY-HINT already states that too long sequences
> strings should be cut short at the final multibyte character fitting
> inside the specified length, meaning that output can already get
> mangled.
> 
> This also brings me to the question how we should handle DISPLAY-HINT
> in the case of -Oa and -Ox. Net-snmp prioritizes DISPLAY-HINT over
> these, but it has the option to disable this by disabling MIB-loading
> via -m''; This implementation doesn't give us the that option (yet?).
> The current diff follows net-snmp, but there is something to say to
> prioritize -Ox over DISPLAY-HINT, so an admin can use it to debug
> snmp-output without it being mangled. Any feedback here is welcome.
> 
> Once it's clear if this is the right approach I'll do a thorough search
> through mib.h on which objects actually can use this new definition.

I can't really comment on what snmp(1) should do, or which settings
should override which other settings.

Do i understand correctly that you want to make -Oa print invalid
UTF-8 to the terminal?  If so, that would sound reasonable to me,
for the following reason.  Printing non-printable ASCII to the
terminal is often dangerous, it can screw up or reconfigure the
terminal, so -Oa is already an option that must be used with care,
just like `cat /usr/bin/cc`.  While printing invalid UTF-8 is not
a good idea in general, it is less dangerous than printing non-printable
ASCII, so just passing the bytes through for -Oa doesn't make
anything more dangerous than it already is.

Maybe -Oa should have a warning in the manual page about the dangers
of using it on untrusted data?  Not sure.

Oh, by the way, when -Oa is not specified and there is UTF-8
content, don't forget to still filter out non-printable bytes:
both non-printable ASCII bytes and non-printable UTF-8 characters.
Right?

> Index: smi.c
> ===
> RCS file: /cvs/src/usr.bin/snmp/smi.c,v
[...]
> @@ -598,6 +638,60 @@ smi_foreach(struct oid *oid)
>  }
>  
>  int
> +isu8cont(unsigned char c)
> +{
> + return (c & (0x80 | 0x40)) == 0x80;
> +}
> +
> +char *
> +smi_displayhint_os(struct textconv *tc, const char *buf, size_t buflen)
> +{
> + size_t octetlength, i, j, start;
> + char *displayformat;
> + char *rbuf;
> + mbstate_t mbs;
> +
> + errno = 0;
> + octetlength = (size_t) strtol(tc->tc_display_hint, , 10);
> + if (octetlength == 0 && errno != 0) {
> + errno = EINVAL;
> + return NULL;
> + }
> + if (displayformat[0] == 't') {
> + rbuf = malloc(octetlength + 1);

NULL pointer access ahead on ENOMEM.

> + if (strcmp(nl_langinfo(CODESET), "UTF-8") == 0) {

On OpenBSD, that is not the normal way of checking whether
the locale is UTF-8.  Also, it is not portable because the CODESET
values are not standardized.

We usually do

#include 

if (MB_CUR_MAX > 

snmp(1) initial UTF-8 support

2020-05-19 Thread Martijn van Duren
So according to RFC2579 an octetstring can contain UTF-8 characters if  
so described in the DISPLAY-HINT. One of the main consumers of this is
SnmpAdminString, which is used quite a lot.

Now that we trimmed a little fat from snmp's oid, I wanted to fill it
up again and implemented the bare minimum DISPLAY-HINT parsing that is
required for SnmpAdminString. Other parts of the syntax can be atter
at a later state if desirable.

Now I decided to implement this a little different from net-snmp,
because they throw incomplete sequences directly to the terminal, which
I recon is not a sane approach. I choose to take the approach taken by
the likes of wall(1) and replace invalid and incomplete sequences with
a '?'. This because DISPLAY-HINT already states that too long sequences
strings should be cut short at the final multibyte character fitting
inside the specified length, meaning that output can already get
mangled.

This also brings me to the question how we should handle DISPLAY-HINT
in the case of -Oa and -Ox. Net-snmp prioritizes DISPLAY-HINT over
these, but it has the option to disable this by disabling MIB-loading
via -m''; This implementation doesn't give us the that option (yet?).
The current diff follows net-snmp, but there is something to say to
prioritize -Ox over DISPLAY-HINT, so an admin can use it to debug
snmp-output without it being mangled. Any feedback here is welcome.

Once it's clear if this is the right approach I'll do a thorough search
through mib.h on which objects actually can use this new definition.

Example choosen because his copyright was in front of me:
$ snmp walk 127.0.0.1 sysContact  
sysContact.0 = Hex-STRING: 52 65 79 6B 20 46 6C C3 B6 74 65 72
$ ./obj/snmp walk 127.0.0.1 sysContact
sysContact.0 = Reyk Flöter
$ LC_ALL=C
$ ./obj/snmp walk 127.0.0.1 sysContact
sysContact.0 = Reyk Fl?ter

Thoughts?

martijn@

Index: mib.c
===
RCS file: /cvs/src/usr.bin/snmp/mib.c,v
retrieving revision 1.2
diff -u -p -r1.2 mib.c
--- mib.c   19 May 2020 13:41:01 -  1.2
+++ mib.c   19 May 2020 20:21:42 -
@@ -27,9 +27,11 @@
 #include "smi.h"
 
 static struct oid mib_tree[] = MIB_TREE;
+static struct textconv textconv_tree[] = TEXTCONV_TREE;
 
 void
 mib_init(void)
 {
smi_mibtree(mib_tree);
+   smi_textconvtree(textconv_tree);
 }
Index: mib.h
===
RCS file: /cvs/src/usr.bin/snmp/mib.h,v
retrieving revision 1.1
diff -u -p -r1.1 mib.h
--- mib.h   9 Aug 2019 06:17:59 -   1.1
+++ mib.h   19 May 2020 20:21:42 -
@@ -751,7 +751,7 @@
{ MIBDECL(sysDescr) },  \
{ MIBDECL(sysOID) },\
{ MIBDECL(sysUpTime) }, \
-   { MIBDECL(sysContact) },\
+   { MIBDECL(sysContact), "SnmpAdminString" }, \
{ MIBDECL(sysName) },   \
{ MIBDECL(sysLocation) },   \
{ MIBDECL(sysServices) },   \
@@ -1345,6 +1345,11 @@
{ MIBDECL(ipfRouteEntRouteMetric5) },   \
{ MIBDECL(ipfRouteEntStatus) }, \
{ MIBEND }  \
+}
+
+#define TEXTCONV_TREE {\
+   { "SnmpAdminString", "255t", BER_TYPE_OCTETSTRING }, \
+   { NULL, NULL }  \
 }
 
 voidmib_init(void);
Index: smi.c
===
RCS file: /cvs/src/usr.bin/snmp/smi.c,v
retrieving revision 1.8
diff -u -p -r1.8 smi.c
--- smi.c   19 May 2020 13:41:01 -  1.8
+++ smi.c   19 May 2020 20:21:42 -
@@ -24,10 +24,13 @@
 #include 
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "ber.h"
 #include "mib.h"
@@ -36,8 +39,12 @@
 
 #define MINIMUM(a, b)  (((a) < (b)) ? (a) : (b))
 
+int isu8cont(unsigned char);
+char *smi_displayhint_os(struct textconv *, const char *, size_t);
+
 int smi_oid_cmp(struct oid *, struct oid *);
 int smi_key_cmp(struct oid *, struct oid *);
+int smi_textconv_cmp(struct textconv *, struct textconv *);
 struct oid * smi_findkey(char *);
 
 RB_HEAD(oidtree, oid);
@@ -48,6 +55,10 @@ RB_HEAD(keytree, oid);
 RB_PROTOTYPE(keytree, oid, o_keyword, smi_key_cmp)
 struct keytree smi_keytree;
 
+RB_HEAD(textconvtree, textconv);
+RB_PROTOTYPE(textconvtree, textconv, tc_entry, smi_textconv_cmp);
+struct textconvtree smi_tctree;
+
 int
 smi_init(void)
 {
@@ -181,7 +192,7 @@ smi_debug_elements(struct ber_element *r
fprintf(stderr, "(%u) encoding %u ",
root->be_type, root->be_encoding);
 
-   if ((value = smi_print_element(root, 1, smi_os_default,
+   if ((value = smi_print_element(NULL, root, 1, smi_os_default,
smi_oidl_numeric)) == NULL)
goto