Re: [PATCH 1b/6] Add __attribute__((untrusted))

2022-01-06 Thread Martin Sebor via Gcc-patches

On 1/6/22 8:10 AM, David Malcolm wrote:

On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote:

On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:

This patch adds a new:

    __attribute__((untrusted))

for use by the C front-end, intended for use by the Linux kernel for
use with "__user", but which could be used by other operating system
kernels, and potentialy by other projects.


It looks like untrusted is a type attribute (rather than one
that applies to variables and/or function return values or
writeable by-reference arguments).  I find that quite surprising.


FWIW I initially tried implementing it on pointer types, but doing it
on the underlying type was much cleaner.


   I'm used to thinking of trusted vs tainted as dynamic properties
of data so I'm having trouble deciding what to think about
the attribute applying to types.  Can you explain why it's
useful on types?


A type system *is* a way of detecting problems involving dynamic
properties of data.  Ultimately all we have at runtime is a collection
of bits; the toolchain has the concept of types as a way to allow us to
reason about properies of those bits without requiring a full cross-TU
analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned
integer), and to document these properties clearly to human readers of
the code.


I understand that relying on the type system is a way to do it.
It just doesn't seem like a very good way in a loosely typed
language like C (or C++).



I see this as working like a qualifier (rather like "const" and
"volatile"), in that an
   untrusted char *
when dereferenced gives you an
   untrusted char


Dereferencing a const char* yields a const char lvalue that
implicitly converts to an unqualified value of the referenced
object.  The qualifier is lost in the conversion, so modeling
taint/trust this way will also lose the property in the same
contexts.  It sounds to me like the concept you're modeling
might be more akin to a type specifier (maybe like _Atomic,
although that still converts to the underlying type).



The intent is to have a way of treating the values as "actively
hostile", so that code analyzers can assume the worst possible values
for such types (or more glibly, that we're dealing with data from Satan
rather than from Murphy).

Such types are also relevant to infoleaks: writing sensitive
information to an untrusted value can be detected relatively easily
with this approach, by checking the type of the value - the types
express the trust boundary

Doing this with qualifiers allows us to use the C type system to detect
these kinds of issues without having to add a full cross-TU
interprocedural analysis, and documents it to human readers of the
code.   Compare with const-correctness; we can have an analogous
"trust-correctness".


The problem with const-correctness in C is that it's so easily
lost (like with strchr, or in the lvalue-rvalue conversion).
This is also why I'm skeptical of the type-based approach here.





I'd expect the taint property of a type to be quickly lost as
an object of the type is passed through existing APIs (e.g.,
a char array manipulated by string functions like strchr).


FWIW you can't directly pass an attacker-controlled buffer to strchr:
strchr requires there to be a 0-terminator to the array; if the array's
content is untrusted then the attacker might not have 0-terminated it.


strchr is just an example of the many functions that in my mind
make the type-based approach less than ideal.  If the untrusted
string was known to be nul-teminated, strchr still couldn't be
used without losing the property.  Ditto for memchr.  It seems
that all sanitization would either have to be written from
scratch, without relying on existing utility functions, or by
providing wrappers that called the common utility functions
after removing the qualifier from the tainted data even before
the santization was complete.  That would obviously be error-
prone, but it's something that would be made much more robust
by tracking the taint independently of the data type.

Martin



As implemented, the patch doesn't complain about this, though maybe it
should.

The main point here is to support the existing __user annotation used
by the Linux kernel, in particular, copy_from_user and copy_to_user.



(I usually look at tests to help me understand the design of
a change but I couldn't find an answer to my question in those
in the patch.)


The patch kit was rather unclear on this, due to the use of two
different approaches (custom address spaces vs this untrusted
attribute).  Sorry about this.

Patches 4a and 4b in the kit add test-uaccess.h (to
gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests
that use "test-uaccess.h" in patch 3:
  [PATCH 3/6] analyzer: implement infoleak detection
 https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
and in patch 5:
  [PATCH 5/6] analyzer: use region::untrusted_p in taint detection

Re: [PATCH 1b/6] Add __attribute__((untrusted))

2022-01-06 Thread David Malcolm via Gcc-patches
On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > This patch adds a new:
> > 
> >    __attribute__((untrusted))
> > 
> > for use by the C front-end, intended for use by the Linux kernel for
> > use with "__user", but which could be used by other operating system
> > kernels, and potentialy by other projects.
> 
> It looks like untrusted is a type attribute (rather than one
> that applies to variables and/or function return values or
> writeable by-reference arguments).  I find that quite surprising.

FWIW I initially tried implementing it on pointer types, but doing it
on the underlying type was much cleaner.

>   I'm used to thinking of trusted vs tainted as dynamic properties
> of data so I'm having trouble deciding what to think about
> the attribute applying to types.  Can you explain why it's
> useful on types?

A type system *is* a way of detecting problems involving dynamic
properties of data.  Ultimately all we have at runtime is a collection
of bits; the toolchain has the concept of types as a way to allow us to
reason about properies of those bits without requiring a full cross-TU
analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned
integer), and to document these properties clearly to human readers of
the code.

I see this as working like a qualifier (rather like "const" and
"volatile"), in that an
  untrusted char *
when dereferenced gives you an
  untrusted char

The intent is to have a way of treating the values as "actively
hostile", so that code analyzers can assume the worst possible values
for such types (or more glibly, that we're dealing with data from Satan
rather than from Murphy).

Such types are also relevant to infoleaks: writing sensitive
information to an untrusted value can be detected relatively easily
with this approach, by checking the type of the value - the types
express the trust boundary

Doing this with qualifiers allows us to use the C type system to detect
these kinds of issues without having to add a full cross-TU
interprocedural analysis, and documents it to human readers of the
code.   Compare with const-correctness; we can have an analogous
"trust-correctness".

> 
> I'd expect the taint property of a type to be quickly lost as
> an object of the type is passed through existing APIs (e.g.,
> a char array manipulated by string functions like strchr).

FWIW you can't directly pass an attacker-controlled buffer to strchr:
strchr requires there to be a 0-terminator to the array; if the array's
content is untrusted then the attacker might not have 0-terminated it.

As implemented, the patch doesn't complain about this, though maybe it
should.

The main point here is to support the existing __user annotation used
by the Linux kernel, in particular, copy_from_user and copy_to_user.

> 
> (I usually look at tests to help me understand the design of
> a change but I couldn't find an answer to my question in those
> in the patch.)

The patch kit was rather unclear on this, due to the use of two
different approaches (custom address spaces vs this untrusted
attribute).  Sorry about this.

Patches 4a and 4b in the kit add test-uaccess.h (to
gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests
that use "test-uaccess.h" in patch 3:
 [PATCH 3/6] analyzer: implement infoleak detection
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html
and in patch 5:
 [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
   https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584374.html

(sorry about messing up the order of the patches).

Patch 4a here:
 [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom 
address spaces
   https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584371.html
implements "__user" as a custom address space, 

whereas patch 4b here:

 [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of 
__attribute__((untrusted))
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584373.html

implements "__user" to be __attribute__((untrusted)).

Perhaps I should drop the custom address space versions of the patches
and post a version of the kit that simply uses the attribute?

Dave


> 
> Thanks
> Martin
> 
> PS I found one paper online that discusses type-based taint
> analysis in Java but not much more.  I only quickly skimmed
> the paper and although it conceptually makes sense I'm still
> having difficulties seeing how it would be useful in C.
> 
> > 
> > Known issues:
> > - at least one TODO in handle_untrusted_attribute
> > - should it be permitted to dereference an untrusted pointer?  The
> > patch
> >    currently allows this
> > 
> > gcc/c-family/ChangeLog:
> > * c-attribs.c (c_common_attribute_table): Add "untrusted".
> > (build_untrusted_type): New.
> > (handle_untrusted_attribute): New.
> > * c-pretty-print.c (pp_c_cv_qualifiers): Handle
> > 

Re: [PATCH 1b/6] Add __attribute__((untrusted))

2021-12-09 Thread Martin Sebor via Gcc-patches

On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:

This patch adds a new:

   __attribute__((untrusted))

for use by the C front-end, intended for use by the Linux kernel for
use with "__user", but which could be used by other operating system
kernels, and potentialy by other projects.


It looks like untrusted is a type attribute (rather than one
that applies to variables and/or function return values or
writeable by-reference arguments).  I find that quite surprising.
 I'm used to thinking of trusted vs tainted as dynamic properties
of data so I'm having trouble deciding what to think about
the attribute applying to types.  Can you explain why it's
useful on types?

I'd expect the taint property of a type to be quickly lost as
an object of the type is passed through existing APIs (e.g.,
a char array manipulated by string functions like strchr).

(I usually look at tests to help me understand the design of
a change but I couldn't find an answer to my question in those
in the patch.)

Thanks
Martin

PS I found one paper online that discusses type-based taint
analysis in Java but not much more.  I only quickly skimmed
the paper and although it conceptually makes sense I'm still
having difficulties seeing how it would be useful in C.



Known issues:
- at least one TODO in handle_untrusted_attribute
- should it be permitted to dereference an untrusted pointer?  The patch
   currently allows this

gcc/c-family/ChangeLog:
* c-attribs.c (c_common_attribute_table): Add "untrusted".
(build_untrusted_type): New.
(handle_untrusted_attribute): New.
* c-pretty-print.c (pp_c_cv_qualifiers): Handle
TYPE_QUAL_UNTRUSTED.

gcc/c/ChangeLog:
* c-typeck.c (convert_for_assignment): Complain if the trust
levels vary when assigning a non-NULL pointer.

gcc/ChangeLog:
* doc/extend.texi (Common Type Attributes): Add "untrusted".
* print-tree.c (print_node): Handle TYPE_UNTRUSTED.
* tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED.
(struct tree_type_common): Assign one of the spare bits to a new
"untrusted_flag".
* tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED.
* tree.h (TYPE_QUALS): Likewise.
(TYPE_QUALS_NO_ADDR_SPACE): Likewise.
(TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise.

gcc/testsuite/ChangeLog:
* c-c++-common/attr-untrusted-1.c: New test.

Signed-off-by: David Malcolm 
---
  gcc/c-family/c-attribs.c  |  59 +++
  gcc/c-family/c-pretty-print.c |   2 +
  gcc/c/c-typeck.c  |  64 +++
  gcc/doc/extend.texi   |  25 +++
  gcc/print-tree.c  |   3 +
  gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++
  gcc/tree-core.h   |   6 +-
  gcc/tree.c|   1 +
  gcc/tree.h|  11 +-
  9 files changed, 332 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..100c2dabab2 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute (tree *, 
tree, tree, int,
 bool *);
  static tree handle_access_attribute (tree *, tree, tree, int, bool *);
  
+static tree handle_untrusted_attribute (tree *, tree, tree, int, bool *);

  static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
  static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *);
  static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
@@ -536,6 +537,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_special_var_sec_attribute, 
attr_section_exclusions },
{ "access", 1, 3, false, true, true, false,
  handle_access_attribute, NULL },
+  { "untrusted",   0, 0, false,  true, false, true,
+ handle_untrusted_attribute, NULL },
/* Attributes used by Objective-C.  */
{ "NSObject",   0, 0, true, false, false, false,
  handle_nsobject_attribute, NULL },
@@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool 
skip_voidptr)
return build_tree_list (name, attrargs);
  }
  
+/* Build (or reuse) a type based on BASE_TYPE, but with

+   TYPE_QUAL_UNTRUSTED.  */
+
+static tree
+build_untrusted_type (tree base_type)
+{
+  int base_type_quals = TYPE_QUALS (base_type);
+  return build_qualified_type (base_type,
+  base_type_quals | TYPE_QUAL_UNTRUSTED);
+}
+
+/* Handle an "untrusted" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree

[PATCH 1b/6] Add __attribute__((untrusted))

2021-11-13 Thread David Malcolm via Gcc-patches
This patch adds a new:

  __attribute__((untrusted))

for use by the C front-end, intended for use by the Linux kernel for
use with "__user", but which could be used by other operating system
kernels, and potentialy by other projects.

Known issues:
- at least one TODO in handle_untrusted_attribute
- should it be permitted to dereference an untrusted pointer?  The patch
  currently allows this

gcc/c-family/ChangeLog:
* c-attribs.c (c_common_attribute_table): Add "untrusted".
(build_untrusted_type): New.
(handle_untrusted_attribute): New.
* c-pretty-print.c (pp_c_cv_qualifiers): Handle
TYPE_QUAL_UNTRUSTED.

gcc/c/ChangeLog:
* c-typeck.c (convert_for_assignment): Complain if the trust
levels vary when assigning a non-NULL pointer.

gcc/ChangeLog:
* doc/extend.texi (Common Type Attributes): Add "untrusted".
* print-tree.c (print_node): Handle TYPE_UNTRUSTED.
* tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED.
(struct tree_type_common): Assign one of the spare bits to a new
"untrusted_flag".
* tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED.
* tree.h (TYPE_QUALS): Likewise.
(TYPE_QUALS_NO_ADDR_SPACE): Likewise.
(TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise.

gcc/testsuite/ChangeLog:
* c-c++-common/attr-untrusted-1.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/c-family/c-attribs.c  |  59 +++
 gcc/c-family/c-pretty-print.c |   2 +
 gcc/c/c-typeck.c  |  64 +++
 gcc/doc/extend.texi   |  25 +++
 gcc/print-tree.c  |   3 +
 gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++
 gcc/tree-core.h   |   6 +-
 gcc/tree.c|   1 +
 gcc/tree.h|  11 +-
 9 files changed, 332 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 007b928c54b..100c2dabab2 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute (tree *, 
tree, tree, int,
 bool *);
 static tree handle_access_attribute (tree *, tree, tree, int, bool *);
 
+static tree handle_untrusted_attribute (tree *, tree, tree, int, bool *);
 static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
 static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
@@ -536,6 +537,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_special_var_sec_attribute, 
attr_section_exclusions },
   { "access",1, 3, false, true, true, false,
  handle_access_attribute, NULL },
+  { "untrusted", 0, 0, false,  true, false, true,
+ handle_untrusted_attribute, NULL },
   /* Attributes used by Objective-C.  */
   { "NSObject",  0, 0, true, false, false, false,
  handle_nsobject_attribute, NULL },
@@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool 
skip_voidptr)
   return build_tree_list (name, attrargs);
 }
 
+/* Build (or reuse) a type based on BASE_TYPE, but with
+   TYPE_QUAL_UNTRUSTED.  */
+
+static tree
+build_untrusted_type (tree base_type)
+{
+  int base_type_quals = TYPE_QUALS (base_type);
+  return build_qualified_type (base_type,
+  base_type_quals | TYPE_QUAL_UNTRUSTED);
+}
+
+/* Handle an "untrusted" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_untrusted_attribute (tree *node, tree ARG_UNUSED (name),
+   tree ARG_UNUSED (args), int ARG_UNUSED (flags),
+   bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == POINTER_TYPE)
+{
+  tree base_type = TREE_TYPE (*node);
+  tree untrusted_base_type = build_untrusted_type (base_type);
+  *node = build_pointer_type (untrusted_base_type);
+  *no_add_attrs = true; /* OK */
+  return NULL_TREE;
+}
+  else if (TREE_CODE (*node) == FUNCTION_TYPE)
+{
+  tree return_type = TREE_TYPE (*node);
+  if (TREE_CODE (return_type) == POINTER_TYPE)
+   {
+ tree base_type = TREE_TYPE (return_type);
+ tree untrusted_base_type = build_untrusted_type (base_type);
+ tree untrusted_return_type = build_pointer_type (untrusted_base_type);
+ tree fn_type = build_function_type (untrusted_return_type,
+ TYPE_ARG_TYPES (*node));
+ *node = fn_type;
+ *no_add_attrs = true; /* OK */
+ return