Re: [PATCH 1b/6] Add __attribute__((untrusted))
On 1/6/22 8:10 AM, David Malcolm wrote: On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote: On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote: This patch adds a new: __attribute__((untrusted)) for use by the C front-end, intended for use by the Linux kernel for use with "__user", but which could be used by other operating system kernels, and potentialy by other projects. It looks like untrusted is a type attribute (rather than one that applies to variables and/or function return values or writeable by-reference arguments). I find that quite surprising. FWIW I initially tried implementing it on pointer types, but doing it on the underlying type was much cleaner. I'm used to thinking of trusted vs tainted as dynamic properties of data so I'm having trouble deciding what to think about the attribute applying to types. Can you explain why it's useful on types? A type system *is* a way of detecting problems involving dynamic properties of data. Ultimately all we have at runtime is a collection of bits; the toolchain has the concept of types as a way to allow us to reason about properies of those bits without requiring a full cross-TU analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned integer), and to document these properties clearly to human readers of the code. I understand that relying on the type system is a way to do it. It just doesn't seem like a very good way in a loosely typed language like C (or C++). I see this as working like a qualifier (rather like "const" and "volatile"), in that an untrusted char * when dereferenced gives you an untrusted char Dereferencing a const char* yields a const char lvalue that implicitly converts to an unqualified value of the referenced object. The qualifier is lost in the conversion, so modeling taint/trust this way will also lose the property in the same contexts. It sounds to me like the concept you're modeling might be more akin to a type specifier (maybe like _Atomic, although that still converts to the underlying type). The intent is to have a way of treating the values as "actively hostile", so that code analyzers can assume the worst possible values for such types (or more glibly, that we're dealing with data from Satan rather than from Murphy). Such types are also relevant to infoleaks: writing sensitive information to an untrusted value can be detected relatively easily with this approach, by checking the type of the value - the types express the trust boundary Doing this with qualifiers allows us to use the C type system to detect these kinds of issues without having to add a full cross-TU interprocedural analysis, and documents it to human readers of the code. Compare with const-correctness; we can have an analogous "trust-correctness". The problem with const-correctness in C is that it's so easily lost (like with strchr, or in the lvalue-rvalue conversion). This is also why I'm skeptical of the type-based approach here. I'd expect the taint property of a type to be quickly lost as an object of the type is passed through existing APIs (e.g., a char array manipulated by string functions like strchr). FWIW you can't directly pass an attacker-controlled buffer to strchr: strchr requires there to be a 0-terminator to the array; if the array's content is untrusted then the attacker might not have 0-terminated it. strchr is just an example of the many functions that in my mind make the type-based approach less than ideal. If the untrusted string was known to be nul-teminated, strchr still couldn't be used without losing the property. Ditto for memchr. It seems that all sanitization would either have to be written from scratch, without relying on existing utility functions, or by providing wrappers that called the common utility functions after removing the qualifier from the tainted data even before the santization was complete. That would obviously be error- prone, but it's something that would be made much more robust by tracking the taint independently of the data type. Martin As implemented, the patch doesn't complain about this, though maybe it should. The main point here is to support the existing __user annotation used by the Linux kernel, in particular, copy_from_user and copy_to_user. (I usually look at tests to help me understand the design of a change but I couldn't find an answer to my question in those in the patch.) The patch kit was rather unclear on this, due to the use of two different approaches (custom address spaces vs this untrusted attribute). Sorry about this. Patches 4a and 4b in the kit add test-uaccess.h (to gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests that use "test-uaccess.h" in patch 3: [PATCH 3/6] analyzer: implement infoleak detection https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html and in patch 5: [PATCH 5/6] analyzer: use region::untrusted_p in taint detection
Re: [PATCH 1b/6] Add __attribute__((untrusted))
On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote: > On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote: > > This patch adds a new: > > > > __attribute__((untrusted)) > > > > for use by the C front-end, intended for use by the Linux kernel for > > use with "__user", but which could be used by other operating system > > kernels, and potentialy by other projects. > > It looks like untrusted is a type attribute (rather than one > that applies to variables and/or function return values or > writeable by-reference arguments). I find that quite surprising. FWIW I initially tried implementing it on pointer types, but doing it on the underlying type was much cleaner. > I'm used to thinking of trusted vs tainted as dynamic properties > of data so I'm having trouble deciding what to think about > the attribute applying to types. Can you explain why it's > useful on types? A type system *is* a way of detecting problems involving dynamic properties of data. Ultimately all we have at runtime is a collection of bits; the toolchain has the concept of types as a way to allow us to reason about properies of those bits without requiring a full cross-TU analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned integer), and to document these properties clearly to human readers of the code. I see this as working like a qualifier (rather like "const" and "volatile"), in that an untrusted char * when dereferenced gives you an untrusted char The intent is to have a way of treating the values as "actively hostile", so that code analyzers can assume the worst possible values for such types (or more glibly, that we're dealing with data from Satan rather than from Murphy). Such types are also relevant to infoleaks: writing sensitive information to an untrusted value can be detected relatively easily with this approach, by checking the type of the value - the types express the trust boundary Doing this with qualifiers allows us to use the C type system to detect these kinds of issues without having to add a full cross-TU interprocedural analysis, and documents it to human readers of the code. Compare with const-correctness; we can have an analogous "trust-correctness". > > I'd expect the taint property of a type to be quickly lost as > an object of the type is passed through existing APIs (e.g., > a char array manipulated by string functions like strchr). FWIW you can't directly pass an attacker-controlled buffer to strchr: strchr requires there to be a 0-terminator to the array; if the array's content is untrusted then the attacker might not have 0-terminated it. As implemented, the patch doesn't complain about this, though maybe it should. The main point here is to support the existing __user annotation used by the Linux kernel, in particular, copy_from_user and copy_to_user. > > (I usually look at tests to help me understand the design of > a change but I couldn't find an answer to my question in those > in the patch.) The patch kit was rather unclear on this, due to the use of two different approaches (custom address spaces vs this untrusted attribute). Sorry about this. Patches 4a and 4b in the kit add test-uaccess.h (to gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests that use "test-uaccess.h" in patch 3: [PATCH 3/6] analyzer: implement infoleak detection https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html and in patch 5: [PATCH 5/6] analyzer: use region::untrusted_p in taint detection https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584374.html (sorry about messing up the order of the patches). Patch 4a here: [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584371.html implements "__user" as a custom address space, whereas patch 4b here: [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted)) https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584373.html implements "__user" to be __attribute__((untrusted)). Perhaps I should drop the custom address space versions of the patches and post a version of the kit that simply uses the attribute? Dave > > Thanks > Martin > > PS I found one paper online that discusses type-based taint > analysis in Java but not much more. I only quickly skimmed > the paper and although it conceptually makes sense I'm still > having difficulties seeing how it would be useful in C. > > > > > Known issues: > > - at least one TODO in handle_untrusted_attribute > > - should it be permitted to dereference an untrusted pointer? The > > patch > > currently allows this > > > > gcc/c-family/ChangeLog: > > * c-attribs.c (c_common_attribute_table): Add "untrusted". > > (build_untrusted_type): New. > > (handle_untrusted_attribute): New. > > * c-pretty-print.c (pp_c_cv_qualifiers): Handle > >
Re: [PATCH 1b/6] Add __attribute__((untrusted))
On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote: This patch adds a new: __attribute__((untrusted)) for use by the C front-end, intended for use by the Linux kernel for use with "__user", but which could be used by other operating system kernels, and potentialy by other projects. It looks like untrusted is a type attribute (rather than one that applies to variables and/or function return values or writeable by-reference arguments). I find that quite surprising. I'm used to thinking of trusted vs tainted as dynamic properties of data so I'm having trouble deciding what to think about the attribute applying to types. Can you explain why it's useful on types? I'd expect the taint property of a type to be quickly lost as an object of the type is passed through existing APIs (e.g., a char array manipulated by string functions like strchr). (I usually look at tests to help me understand the design of a change but I couldn't find an answer to my question in those in the patch.) Thanks Martin PS I found one paper online that discusses type-based taint analysis in Java but not much more. I only quickly skimmed the paper and although it conceptually makes sense I'm still having difficulties seeing how it would be useful in C. Known issues: - at least one TODO in handle_untrusted_attribute - should it be permitted to dereference an untrusted pointer? The patch currently allows this gcc/c-family/ChangeLog: * c-attribs.c (c_common_attribute_table): Add "untrusted". (build_untrusted_type): New. (handle_untrusted_attribute): New. * c-pretty-print.c (pp_c_cv_qualifiers): Handle TYPE_QUAL_UNTRUSTED. gcc/c/ChangeLog: * c-typeck.c (convert_for_assignment): Complain if the trust levels vary when assigning a non-NULL pointer. gcc/ChangeLog: * doc/extend.texi (Common Type Attributes): Add "untrusted". * print-tree.c (print_node): Handle TYPE_UNTRUSTED. * tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED. (struct tree_type_common): Assign one of the spare bits to a new "untrusted_flag". * tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED. * tree.h (TYPE_QUALS): Likewise. (TYPE_QUALS_NO_ADDR_SPACE): Likewise. (TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/attr-untrusted-1.c: New test. Signed-off-by: David Malcolm --- gcc/c-family/c-attribs.c | 59 +++ gcc/c-family/c-pretty-print.c | 2 + gcc/c/c-typeck.c | 64 +++ gcc/doc/extend.texi | 25 +++ gcc/print-tree.c | 3 + gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++ gcc/tree-core.h | 6 +- gcc/tree.c| 1 + gcc/tree.h| 11 +- 9 files changed, 332 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index 007b928c54b..100c2dabab2 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute (tree *, tree, tree, int, bool *); static tree handle_access_attribute (tree *, tree, tree, int, bool *); +static tree handle_untrusted_attribute (tree *, tree, tree, int, bool *); static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *); static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *); static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *); @@ -536,6 +537,8 @@ const struct attribute_spec c_common_attribute_table[] = handle_special_var_sec_attribute, attr_section_exclusions }, { "access", 1, 3, false, true, true, false, handle_access_attribute, NULL }, + { "untrusted", 0, 0, false, true, false, true, + handle_untrusted_attribute, NULL }, /* Attributes used by Objective-C. */ { "NSObject", 0, 0, true, false, false, false, handle_nsobject_attribute, NULL }, @@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool skip_voidptr) return build_tree_list (name, attrargs); } +/* Build (or reuse) a type based on BASE_TYPE, but with + TYPE_QUAL_UNTRUSTED. */ + +static tree +build_untrusted_type (tree base_type) +{ + int base_type_quals = TYPE_QUALS (base_type); + return build_qualified_type (base_type, + base_type_quals | TYPE_QUAL_UNTRUSTED); +} + +/* Handle an "untrusted" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree
[PATCH 1b/6] Add __attribute__((untrusted))
This patch adds a new: __attribute__((untrusted)) for use by the C front-end, intended for use by the Linux kernel for use with "__user", but which could be used by other operating system kernels, and potentialy by other projects. Known issues: - at least one TODO in handle_untrusted_attribute - should it be permitted to dereference an untrusted pointer? The patch currently allows this gcc/c-family/ChangeLog: * c-attribs.c (c_common_attribute_table): Add "untrusted". (build_untrusted_type): New. (handle_untrusted_attribute): New. * c-pretty-print.c (pp_c_cv_qualifiers): Handle TYPE_QUAL_UNTRUSTED. gcc/c/ChangeLog: * c-typeck.c (convert_for_assignment): Complain if the trust levels vary when assigning a non-NULL pointer. gcc/ChangeLog: * doc/extend.texi (Common Type Attributes): Add "untrusted". * print-tree.c (print_node): Handle TYPE_UNTRUSTED. * tree-core.h (enum cv_qualifier): Add TYPE_QUAL_UNTRUSTED. (struct tree_type_common): Assign one of the spare bits to a new "untrusted_flag". * tree.c (set_type_quals): Handle TYPE_QUAL_UNTRUSTED. * tree.h (TYPE_QUALS): Likewise. (TYPE_QUALS_NO_ADDR_SPACE): Likewise. (TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/attr-untrusted-1.c: New test. Signed-off-by: David Malcolm --- gcc/c-family/c-attribs.c | 59 +++ gcc/c-family/c-pretty-print.c | 2 + gcc/c/c-typeck.c | 64 +++ gcc/doc/extend.texi | 25 +++ gcc/print-tree.c | 3 + gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++ gcc/tree-core.h | 6 +- gcc/tree.c| 1 + gcc/tree.h| 11 +- 9 files changed, 332 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index 007b928c54b..100c2dabab2 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -136,6 +136,7 @@ static tree handle_warn_unused_result_attribute (tree *, tree, tree, int, bool *); static tree handle_access_attribute (tree *, tree, tree, int, bool *); +static tree handle_untrusted_attribute (tree *, tree, tree, int, bool *); static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *); static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *); static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *); @@ -536,6 +537,8 @@ const struct attribute_spec c_common_attribute_table[] = handle_special_var_sec_attribute, attr_section_exclusions }, { "access",1, 3, false, true, true, false, handle_access_attribute, NULL }, + { "untrusted", 0, 0, false, true, false, true, + handle_untrusted_attribute, NULL }, /* Attributes used by Objective-C. */ { "NSObject", 0, 0, true, false, false, false, handle_nsobject_attribute, NULL }, @@ -5224,6 +5227,62 @@ build_attr_access_from_parms (tree parms, bool skip_voidptr) return build_tree_list (name, attrargs); } +/* Build (or reuse) a type based on BASE_TYPE, but with + TYPE_QUAL_UNTRUSTED. */ + +static tree +build_untrusted_type (tree base_type) +{ + int base_type_quals = TYPE_QUALS (base_type); + return build_qualified_type (base_type, + base_type_quals | TYPE_QUAL_UNTRUSTED); +} + +/* Handle an "untrusted" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_untrusted_attribute (tree *node, tree ARG_UNUSED (name), + tree ARG_UNUSED (args), int ARG_UNUSED (flags), + bool *no_add_attrs) +{ + if (TREE_CODE (*node) == POINTER_TYPE) +{ + tree base_type = TREE_TYPE (*node); + tree untrusted_base_type = build_untrusted_type (base_type); + *node = build_pointer_type (untrusted_base_type); + *no_add_attrs = true; /* OK */ + return NULL_TREE; +} + else if (TREE_CODE (*node) == FUNCTION_TYPE) +{ + tree return_type = TREE_TYPE (*node); + if (TREE_CODE (return_type) == POINTER_TYPE) + { + tree base_type = TREE_TYPE (return_type); + tree untrusted_base_type = build_untrusted_type (base_type); + tree untrusted_return_type = build_pointer_type (untrusted_base_type); + tree fn_type = build_function_type (untrusted_return_type, + TYPE_ARG_TYPES (*node)); + *node = fn_type; + *no_add_attrs = true; /* OK */ + return