Bug#451886: fgets() and poison NULL byte attacks (aka NULL escapes)

2007-11-19 Thread Pierre Habouzit
tag 451886 + wontfix
thanks

On Mon, Nov 19, 2007 at 05:16:29AM +, Andrew Buckeridge wrote:
 package: libc6
 version: 2.3.6.ds1-13etch2
 severity: wishlist
 
 Possible partial fix for fgets and alternatives.
 
 Bug #57729 is marked as done.  It could be fixed for real.
 I have found null escapes a pretty reliable way of breaking many C
 programs including various editors.
 
 The standard C stdio.h fgets function is just plain wrong.
 
 fputs stops at a '\0', but fgets may only stop at '\n' even if a '\0'
 is encountered.  There is no way of knowing how much is read by fgets.
 These functions are inconsistent and are not binary safe.
 
 The standard should be fixed.

  The libc will follow the standard, and the standard will never ever
fix things like that. If you're going to read binary data, indeed, you
should not use FILE* or fgets, but plain read(1) calls with non blocking
sockets in a select loop.

  Anyway, the glibc _packaging_ team will never ever take the liberty to
introduce new symbols that arent in the upstream glibc, it would be way
too disruptive, so please bring this upstream, we don't have the
resources to do that for you.

  and yes btw fputs stops at the first \0 because it puts a string, and
strings in C are NUL terminated. fgets has a disruptive API, and _yes_ I
would very much prefer that fgets had this API intead of the current:

  ssize_t fgets(char *buf, size_t len);

  with the usual semantics that -1 means error (with errno set
properly), and else the return value would be the amount of bytes
written in the buffer buf, not counting the last '\0', meaning that the
return value would be at most equal to len - 1. Or even name this
fgetline as it's what it does for real. Sadly it's not the case, blame
the C committee. Meanwhile I'm leaving that bug as wontfix.
-- 
·O·  Pierre Habouzit
··O[EMAIL PROTECTED]
OOOhttp://www.madism.org


pgpV82Ofp4ur0.pgp
Description: PGP signature


Bug#451886: fgets() and poison NULL byte attacks (aka NULL escapes)

2007-11-18 Thread Andrew Buckeridge
package: libc6
version: 2.3.6.ds1-13etch2
severity: wishlist

Possible partial fix for fgets and alternatives.

Bug #57729 is marked as done.  It could be fixed for real.
I have found null escapes a pretty reliable way of breaking many C
programs including various editors.

The standard C stdio.h fgets function is just plain wrong.

fputs stops at a '\0', but fgets may only stop at '\n' even if a '\0'
is encountered.  There is no way of knowing how much is read by fgets.
These functions are inconsistent and are not binary safe.

The standard should be fixed.

In fcat2.c I add a function int fngets(char *s, int size, FILE *stream);
which has similar functionality to GNU getline.  This can be used with
fwrite.  As fngets does not exist it could be added by any stdio.h
user and will be portable.

In fcat.c I simply fix the broken fputs function in such a way that it
should not break existing use.  I also suggest some macros to check the
stdio.h buffer so that more efficient unistd.h can be mixed, even
though POSIX makes it clear that this should not be done.

How many extra CPU cycles to match '\n' or '\0' rather than just
match '\n' in the library fgets functions?

I give ESMTP as an example of a new line matching problem as this often
uses stdio.h and is often exploited by remote.  Many other
applications match new lines and could also be fed '\0's by remote.

Trivial examples attached.  These are not thread safe.
/*BINFMTC: -O -Wuninitialized -Werror -pedantic-errors

Binary safe stdio.h functions.

fputs stops at a '\0', but fgets may only stop at a '\n'.
These functions are inconsistenat and are not binary safe.

fgets could be made binary safe without breaking existing applications,
but its safe use would be inefficient.  See exaample fcat.c.

 $ gcc -o fcat2.out fcat2.c
 $ ./fcat2.out fcat2.out fcat2.out2
 $ cmp fcat2.out*

Define fngets and fnputs.  These functions use stdio.h, still store a
null '\0', however they read past null '\0' and return the number read
or written.

*/
#define _GNU_SOURCE
#include stdio.h
#include string.h
#include errno.h

/* returns size_t of number written */
#define fnputs(x,y,z) fwrite_unlocked((x),1,(y),(z))

/* Reads size-1 characters up to \n including any \0 cf. getline */
int fngets(char *s, int size, FILE *stream)
{
	char *p;
	int c;

	p=s,c=EOF;
	while(--size0(c=getc_unlocked(stream))!=EOF) {
		*p++=(char)c;
		if (c=='\n') { /* Will read '\0'. */
			break;
		}
	}
	*p='\0'; /* Always mark the end. */
	if(c==EOF) { /* Expected. */
		if(feof_unlocked(stream)) {
			if(p==s) {
return -1;
			}
		} else {
			return -1;
		}
	}
	return p-s; /* Return something meaningful. */
}

/* fdopen of TCP_NODELAY SO_KEEPALIVE socket 
   call fflush for continuation lines teergrube from hell */

int main() {
	char b[1024];
	int i;

	/*(void)setvbuf(stdout,(char *)NULL,_IOLBF,0);*/
	/*setlinebuf(stdout);*/ /* before fprintf(stderr,...) */

	while((i=fngets(b,1024,stdin))0) {
		(void)fprintf(stderr,Got %d bytes.\n,i);
		(void)fprintf(stderr,Put %u bytes.\n,\
		fnputs(b,i,stdout));
		(void)fflush(stdout); /* after fprintf(stderr,...) */
	}

	return 0;
}



/*BINFMTC: -O -Wuninitialized -Werror -pedantic-errors

We often need new line matching, but the peer may have sent additional
information after the new line so we can not use the light weight read.

POSIX has adopted the stdio.h functions such as fgets which match new
lines.

The problem with UNIX/POSIX/IEEE/ISO fgets definition is that it may not
necessarily be binary safe.  This should really be fixed in a way that
is compatible with existing POSIX applications.

Bug #57729 says use non POSIX getline GNU extension which returns length
rather than the infamous pointer. (dietlibc: What were they smoking?)

The function libc6 fgets_unlocked has read past the '\0' and there is no
easy way to find how much has been read.

 $ gcc -o fcat.out fcat.c
 $ ./fcat.out fcat.out fcat.out2
 $ cmp fcat.out*
 fcat.out fcat.out2 differ: byte 9, line 1

The Linux manpage for fgets(3) says: -

 fgets() reads in at most one less than size characters from stream  and
 stores  them  into  the buffer pointed to by s.  Reading stops after an
 EOF or a newline.  If a newline is read, it is stored into the buffer.
 A '\0' is stored after the last character in the buffer.

It does not say that reading should stop at a '\0'.

The behaviour of fgets in the POSIX manpage is also unclear if fgets
reads a '\0'.

UTF-16 could be used with C99 wcslen which matches L'\0', but UTF-16 is
inefficient even for Asian languages so it should not be used.

ESMTP 8BITMIME is fine for ISO-8859-15 which does not contain '\0' or
control characters, but it is already misused with other character sets so
if 8BITMIME is implemented then it should be made binary safe.  So long as
the MUA ensures that messages including binary ones have a \n.\r\n to
terminate the message.  This could be done by matching new lines and then
.\r\n.

Alternatively a way of mixing stdio.h and unistd.h