Re: UTF-8 string filtering

2015-09-20 Thread Damien Miller
On Sat, 12 Sep 2015, Stefan Sperling wrote: > > On Fri, Sep 04, 2015 at 03:17:31PM +1000, Damien Miller wrote: > > Hi, > > > > For a long time OpenBSD has been careful about filtering potentially- > > hostile strings that were destined for logs or TTYs using strvis(3) and > > friends.

Re: UTF-8 string filtering

2015-09-12 Thread Stefan Sperling
On Fri, Sep 04, 2015 at 03:17:31PM +1000, Damien Miller wrote: > Hi, > > For a long time OpenBSD has been careful about filtering potentially- > hostile strings that were destined for logs or TTYs using strvis(3) and > friends. Unfortunately, these don't do a great job for UTF-8 strings > since

Re: UTF-8 string filtering

2015-09-05 Thread Sebastien Marie
On Fri, Sep 04, 2015 at 03:17:31PM +1000, Damien Miller wrote: > Hi, > > Comments appreciated. as micm@ already mentioned, utf8_stringprep.c in patch is in two (or three ?) copies of itself in the same file. > diff --git a/sshconnect2.c b/sshconnect2.c > index 2b525ac..04120e7 100644 > ---

Re: UTF-8 string filtering

2015-09-05 Thread Stefan Sperling
On Sat, Sep 05, 2015 at 10:16:17AM +0200, Sebastien Marie wrote: > but be aware that some functions, like, `mblen' (used in sftp.c, so not > in same context than `utf8_ok' I think), could copte badly with > setlocale() call (man mblen extract): > > Calling any other functions in libc never

Re: UTF-8 string filtering

2015-09-05 Thread Stefan Sperling
On Fri, Sep 04, 2015 at 03:17:31PM +1000, Damien Miller wrote: > +/* Check whether we can display UTF-8 safely */ > +static int > +utf8_ok(void) > +{ > + static int ret = -1; > + char *cp; > + > + if (ret == -1) { > + setlocale(LC_CTYPE, ""); As discussed in the other

Re: UTF-8 string filtering (missing check in mbrtowc [utf8])

2015-09-05 Thread Sebastien Marie
On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote: > > > +static u_int32_t > > +decode_utf8(const char *in, const char **nextc, int *had_error) > > +{ > > Please make sure this function performs the same validation checks > as

Re: UTF-8 string filtering

2015-09-05 Thread Sebastien Marie
About the approch, I see one possible drawback: with this API, we couldn't work on partial string, and we have to manage the whole string in memory. Depending of the usage, it could be a problem (for large block processing for example). On Fri, Sep 04, 2015 at 03:17:31PM +1000, Damien Miller

Re: UTF-8 string filtering

2015-09-05 Thread Stefan Sperling
On Sat, Sep 05, 2015 at 04:38:30PM +0300, pizdel...@gmail.com wrote: > On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote: > > I can't see where you're checking for overlong UTF-8 sequences, for example. > > It is somewhere in there > > + } else if ((e & 0xe0)

Re: UTF-8 string filtering (missing check in mbrtowc [utf8])

2015-09-05 Thread Stefan Sperling
On Sat, Sep 05, 2015 at 02:40:12PM +0200, Sebastien Marie wrote: > We have a missing check in libc function. > > RFC 3629 ask for limiting the range to 0x10: > https://tools.ietf.org/html/rfc3629#page-10 > > Currently, passing a c-string with "f7 bf bf bf" to mbrtowc(3) [with > UTF-8

Re: UTF-8 string filtering

2015-09-05 Thread pizdelect
On Sat, Sep 05, 2015 at 01:26:18PM +0200, Stefan Sperling wrote: > I can't see where you're checking for overlong UTF-8 sequences, for example. It is somewhere in there + } else if ((e & 0xe0) == 0xc0) { /* 11 bit code point */ + state = 1;

Re: UTF-8 string filtering

2015-09-04 Thread Nicholas Marriott
Hi I like this and I would use it in tmux (I have utf8_strvis which is not as strict). The code looks good on a quick read but I'm not a Unicode expert either. I'd like to see useful UTF-8 stuff go into libutil or somewhere common, but I guess perhaps the time of the UTF-8 hackathon in October

UTF-8 string filtering

2015-09-03 Thread Damien Miller
Hi, For a long time OpenBSD has been careful about filtering potentially- hostile strings that were destined for logs or TTYs using strvis(3) and friends. Unfortunately, these don't do a great job for UTF-8 strings since they mangle anything that isn't basic ASCII (not even ISO-8859-1). This