Re: PR analyzer/94362 Partial Fix

2021-03-02 Thread brian.sobulefsky via Gcc-patches
Agreed too. Generic "error on overflow" is not an answer, and ignoring overflow
is not an answer either because flagging faulty memory allocations is an
important feature.

Brian


Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Tuesday, March 2, 2021 6:09 PM, Jeff Law  wrote:

> On 3/2/21 6:40 PM, David Malcolm via Gcc-patches wrote:
>
> > > My devil's advocate position would be if the analyzer raises
> > > exception on
> > > any possible overflow, will that overwhelm the user with false
> > > positives?
> > > Presumably by "raise exception" you mean "issue a diagnostic and stop
> > > analyzing this path", right?
> >
> > I think the point is to detect where numerical overflow can lead to
> > e.g. a buffer overflow, rather than complain about numerical overflow
> > in its own right, like in the make_arr example I gave earlier.
>
> WRT overflow, IMHO, the most valuable case to detect overflows is when
> they feed an allocation via malloc/alloca.  If an attacker can arrange
> to overflow the size computation, then they can cause an
> under-allocation which in turn opens the ability to over-write the stack
> or heap data structures, which in turn are great attack vectors.
>
> And in case you think that's contrived, it isn't :-)
>
> http://phrack.org/issues/67/9.html
>
> > > I
> > > am not sure of the answer here, because a piece of me feels that
> > > overflow is not
> > > something that production code should be relying on in any serious
> > > application,
> > > and so should be non existent, but I am not sure if that is
> > > reflective of
> > > reality.
> > > My belief is that production code is full of overflows, but only some
> > > of them are security-sensitive. Consider e.g. hashing algorithms that
> > > sum some values and beningly assume overflow for wraparound as opposed
> > > to the "calculate the size of the buffer to be allocated" example
> > > (where the overflow is a classic security pitfall).
>
> Agreed.
>
> Jeff




Re: PR analyzer/94362 Partial Fix

2021-03-02 Thread brian.sobulefsky via Gcc-patches
Wow! I wasn't expecting that to work. Obviously we know that  there is currently
no handler for binop_svalue in the constraints so I would have to watch it run 
with
state merging disabled to see how it is managing the unroll. The fact that 
merging
breaks it is indicative of what we are saying though, that the constraint and
merge model is currently insufficient.

Hash algorithms may provide a counterexample for legitimate use of overflow. 
Anyway,
I would prefer tracking the split too, but it is a big change. One way is state
split, like you say, but that is a pretty invasive change to the way the graph 
works.
The other way is to handle "or constraints" as I have said. This is an invasive
change to the constraint model, but arguably the concept of "or" cannot be 
ignored
forever.

Thinking about it, I guess currently the concept of "and" is handled by the
constraints (all constraints in a model exist as a big "and") and the concept of
"or" is handled by the graph. This could be acceptable but we cannot split the
graph arbitrarily, so there is currently no way to handle even a basic

if (i == 1 || i == 10)

what should be a very simple conditional. Handling a hash algorithm, like
you say, is good to keep in mind, because we don't want an explosion of
possibilities. If done right, the analyzer should understand for hashing that
anything + anything => anything, and we see no explosion of state.

By "raise an exception" I did mean issue an analyzer warning, yes. Perhaps the
simple answer is to just track the svalue as a possible overflow in the state
machine and report the warning for certain uses, like the alloc family, as you
said. Regardless, proper overflow handling renders my naive binop handler 
unusable,
because all it does is fold the condition and recur. There is basically no logic
to it, and once you reenter eval_condition it is not possible to know how you 
got
there.



Re: PR analyzer/94362 Partial Fix

2021-03-02 Thread brian.sobulefsky via Gcc-patches
I have been kicking these sorts of ideas around ever since I came to understand 
that
the second "UNKNOWN" in the for loop that originally started this was due to 
the state
merge as we loop. For now, and I don't mean this disrespectfully because it is 
very
hard to get right, the whole issue of merging has basically been punted, given 
some
of the simple examples we found that will merge as an unknown svalue. As you 
think
about this issue, "scope creep" becomes a concern quickly. It  quickly turns 
into
a halting problem of sorts, you have to decide how much of you want the 
analyzer to
be able to "understand" a program. For example, any human can follow this:

sum = 0;
for (idx = 1; idx <= 10; idx++) sum += idx;
__analyzer_eval (sum == 55);

but from an analyzer perspective it opens up all sorts of questions and
becomes a bit of a PhD thesis as to where you draw the line. The biggest concern
with the analyzer seems to be vulnerabilities, so I doubt it is useful to get 
the
analyzer to produce the correct answer for the above code, although it might be
interesting to do so from an academic perspective.

The example you provided gives a concrete reason that overflow should not be a
complete "punt" and I like it. In the interest of fighting scope creep and 
keeping
things manageable, I would question whether you want to actually track the 
overflow /
no overflow cases separately or just raise any possible overflow as an error 
immediately.
I am not disputing your idea, I would prefer to track the overflow and get
a correct result (in this case, an under allocation of memory). I guess I would 
want
to know how much work you think that will be. You still know the codebase a lot 
better
than I do.

My devil's advocate position would be if the analyzer raises exception on
any possible overflow, will that overwhelm the user with false positives? I
am not sure of the answer here, because a piece of me feels that overflow is not
something that production code should be relying on in any serious application,
and so should be non existent, but I am not sure if that is reflective of
reality.

The simplest way to handle your example would be the following:

struct foo * make_arr (size_t n)
{
  if (MAX_INT / sizeof (struct foo) >= n)
return NULL;
  //continue with what you wrote
}

This should add a constraint that downstream of the initial guard, n is small
enough to prevent overflow (I would have to check, but the current analyzer
should be close to doing this if not already correct). Therefore, all we would 
need
would be the check at the definition of sz as to whether it overflows, and that
check should come back negative in my example, unknown in yours. Assuming we 
agree
that the purpose of the analyzer is to prevent vulnerabilities, and not to 
provide
an academic exercise in seeing how close we get to solving the halting problem,
we could just treat any possible overflow as an error.

As this was fundamentally your project, I don't want to tell you how to do it, 
and
the nerd in me wants to build an analyzer that can answer all the silly puzzle 
code
I can think to feed it, but from a utilitarian perspective and given everyone's
limited time, I thought I would offer this as a path you can try.


Re: PR analyzer/94362 Partial Fix

2021-03-01 Thread brian.sobulefsky via Gcc-patches
Hi,

It may not be worth altering at this point, but it seems like it would leave 
less
bugs open if all the constraints go in as "closed" ranges and all evals are
translated to closed intervals. So, if (idx > 0) and if (idx >= 1) are the same
constraint. I know this won't be an option for eventual float support, but
that is a different can of worms. For integers, you can fix it at those entry 
points
and then all other subroutines can ignore the issue of open or closed ranges.

I fully understand the eye glaze and did not want to have to write it that
way. I am thinking if there is a cleaner way to do it. Anyway, that is why I
put a comment in each case to derive the result. This issue of "sides of the
condition" and "inverted operator" as you call it in some places is a recurring
theme. It is especially irritating when we lose commutativity, as we do with 
MINUS.

Adding logic in my subroutine for MULT or DIV is not hard, handling overflow
is a bit more involved. At the very least, we would need to know what the max 
or min
of a particular variable is, which might be in the type tree. We would also 
need to
define how we want to handle the issue.

The problem is (and I have been thinking about this a lot in terms of constraint
merging), there are currently no "or constraints," which would be helpful in 
merging too.
So for overflow, when you have something like

if (idx > 0)
 {
  idx += num;
  __analyzer_eval(idx > num);
 }

you have gone from a single constraint (idx > 0), to an "or condition"
(idx > num || idx < MIN_INT + num). The only solution now, other than ignoring
overflow as a bug that is tolerated, is to just pass it off as unhandled (and
therefore UNKNOWN). Perhaps you may want to add overflow as one of your analyzer
warnings if it cannot be ruled out?

I did try to run a test with a simple || in the condition before just to see 
what
would happen, and as you probably know it is not handled at all. I did not watch
in gdb, but it is obvious from constraint-manager.cc that there is nothing to 
handle
it. I think I actually did an __analyzer_eval() of the same || condition 
verbatim
that was in the if() conditional and still got an UNKNOWN.

It is a pretty intrusive change to add logic for that, which is why I have not
done any work on it yet. Without the concept of "or" I don't see how we could
handle overflow, but maybe you don't really want to handle it anyway, but
rather just emit a warning if it might be considered poor practice to rely
on something that is technically machine dependent anyway.



PR analyzer/94362 Partial Fix

2021-02-27 Thread brian.sobulefsky via Gcc-patches
Hi,

Please find a patch to fix part of the bug PR analyzer/94362. This bug is a
false positive for a null dereference found when compiling openssl. The cause
is the constraint_manager not knowing that i >= 0 within the for block:

for ( ; i-- > 0; )

The bug can be further reduced to the constraint manager not knowing that i >= 0
within the if block:

if (i-- > 0)

which is not replicated for other operators, such as prefix decrement. The
cause is that the constraint is applied to the initial_svalue of i, while it
is a binop_svalue of i that enters the block (with op PLUS and arg1 -1). The
constraint_manager does not have any constraints for this svalue and has no
handler. A handler has been added that essentially recurs on the remaining arg
if the other arg and other side of the condition are both constants and the op
is PLUS_EXPR or MINUS_EXPR.

This in essence fixed the problem, except an off by one error had been hiding
in range::eval_condition. This error is hard to notice, because, for example,
the code

if(idx > 0)
  __analyzer_eval(idx >= 1);

will compile as (check -fdump-ipa-analyzer to see)

void test (int idx)
{
  _Bool _1;
  int _2;

   :
  if (idx_4(D) > 0)
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = idx_4(D) > 0;
  _2 = (int) _1;
  __analyzer_eval (_2);

   :
  return;

}

and will print "TRUE" to the screen, but as you can see, it is for the wrong
reason, because we are not really testing the condition we wanted to test.

You can force the failure (i.e. "UNKNOWN") for yourself with the following:

void test(int idx)
{
  int check = 1;
  if(idx > 0)
__analyzer_eval(idx >= check);
}

which the compiler will not "fix" on us. An examination of range::eval_condition
should convince you that there is an off by one error. Incidentally, I might
recommend doing away with "open intervals of integers" entirely.

When running the initial bug (the for loop), you will find that the analyzer
prints "UNKNOWN" twice for the postfix operator, and "TRUE" "UNKNOWN" for other
operators. This patch reduces postfix to the same state as the other operators.
The second "UNKNOWN" seems to be due to a second "iterated" pass through the
loop with a widening_svalue. A full fix of the bug will need a handler for the
widening svalue, and much more critically, a correct merge of the constraints
at the loop entrance. That, unfortunately, looks to be a hard problem.

This patch fixes a few xfails as noted in the commit message. These were tests
that were evidently devised to test whether the analyzer would understand
arithmetic being done on constrained values. Addition and subtraction is now
working as expected, a handler for multiplication and division can be added.

As was noted in those test files, consideration at some point should be given to
overflow.


Thank you,
Brian

Sent with ProtonMail Secure Email.commit d4052e8c273ca267f6dcf782084d60acfc50a609
Author: Brian Sobulefsky 
Date:   Sat Feb 27 00:36:40 2021 -0800

Changes to support eventual solution to bug PR analyzer/94362. This bug
originated with a false positive null dereference during compilation of
openssl. The bug is in effect caused by an error in constraint handling,
specifically that within the for block:

for ( ; i-- > 0; )
  {
  }

the constraint_manager should know i >= 0 but does not. A reduced form of
this bug was found where the constraint manager did not know within the if
block:

if (i-- > 0)
  {
  }

that i >= 0. This latter error was only present for the postfix
operators, and not for other forms, like --i > 0. It was due to the
constraint being set for the initial_svalue associated with i, but a
binop_svalue being what entered the if block for which no constraint
rules existed.

By adding handling logic for a binop_svalue that adds or
subtracts a constant, this problem was solved. This logic was added to
a new method, constraint_manager::maybe_fold_condition, with the
intent of eventually adding more cases there (unary_svalue and
widening_svalue for example). Additionally, an off by one error was
found in range::eval_condition that needed to be corrected to get
the expected result. Correction of this error was done in that
subroutine, resulting in no more calls to below_lower_bound and
above_upper_bound. As such, these functions were commented out and may
be removed if not needed for anything else.

This change does not entirely fix the initial bug pr94362, but it
reduces the postfix operator to the same state as other operators. In the
case of the for loop, there appears to be an "initial pass" through the
loop, which the analyzer will now understand for postfix, and then an
"iterated pass" with a widening_svalue that the analyzer does not
understand for any condition found. This seems to be due to the merging
of constraints and is under investigation.


Re: Patch for PR analyzer/98797

2021-02-18 Thread brian.sobulefsky via Gcc-patches
Hi David,

Please find my revised patch attached. You should find all items addressed. I
have bootstrapped and rerun the testsuite (gcc and g++ only).  Also, you
should have seen my legal paperwork go in.

1) The commit message has been updated to reflect the nomenclature in the
testsuite, as well as include the required parentheses.

2) The commit message has been updated to reflect the other changes you
requested below.

3) My macros and inline code for "bit_byte_conversion" have been replaced with
the subroutine you requested. It works by modulo and division rather than
bitwise-and and bitshift, as requested. Without any specific instructions, I
put the routine in analyzer.cc and the prototype in analyzer.h, as it provides
a generic helper service. You can move it anywhere else you might want it.

4) I added the check for the field to be sizeof (char). This is actually a
cause to reject getting a char from a string constant for any case, so I
added the check near the outside of the nested ifs. Also, as a side note,
it looks to me like your example:

void test_4 ()
{
  const char *s = "ABCD";
  __analyzer_eval (*(short *)s == 'A');
}

is unrelated to my patch, as I remember a char pointer to a string constant is
handled as a special case elsewhere and does not end up in
maybe_fold_sub_svalue.

5) I updated the modification to get_offset_region so that my new code is only
run in the case of a zerop. This was really the failure issue anyway. I still
have it building a new constant svalue of zero, because I do not know for sure
if the zero pointer type tree and zero integer type tree are different enough to
cause confusion down the line when getting the character.

6) My new subroutine get_parent_cast_region takes any svalues as an argument
and does its own checks for svalue_kind correctness, returning NULL otherwise.
My reason for this is so get_binding_recursive would remain absolutely clean,
the way its recursive call is. Basically, get_binding_recursive can just call
get_parent_cast_region as an assignment within a conditional, just like
the recursion step is, and the returned NULLs, whether by incorrect type or by
not finding a binding for the correct type, causes it to bypass returning there.

7) The if guards were updated as requested.

8) I used your get_field_at_bit_offset routine as requested. To stress again,
this was a static function to region.cc. I had to add a forward declaration
(I again used analyzer.h because it is a "generic helper" function, but that
is easy for you to change if you do not like it there) *and* I had to update
your definition to extern rather than static. Otherwise, it seems to work.

9) The compound conditionals should all be to your liking now.

10) After all the back and forth, I think I am using "approved" helper
functions and macros rather than accessing the tree fields directly. It seems
they all work correctly.

11) All preexisting deja-gnu tests are on the original lines they were on
before. All new tests are appended. Blank lines are left where the dg-bogus
calls were.


Thank you,
Briancommit 7f0e3685900b527b64b81c3b44af36fd9e99bea3
Author: Brian Sobulefsky 
Date:   Tue Feb 9 16:25:43 2021 -0800

Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
recorded by the XFAIL, and some simpler and more complex versions found during
the investigation of it. PR analyzer/98797 was opened for these bugs.

The simplest manifestation of this bug can be replicated with:

char arr[] = {'A', 'B', 'C', 'D'};
char *pt = ar;
__analyzer_eval(*(pt + 0) == 'A');
__analyzer_eval(*(pt + 1) == 'B');
//etc

The result of the eval call is "UNKNOWN" because the analyzer is unable to
determine the value at the pointer. Note that array access (such as arr[0]) is
correctly handled. The code responsible for this is in file
region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue.
The relevant section is commented /* Handle getting individual chars from
STRING_CST */. This section only had a case for an element_region. A case
needed to be added for an offset_region.

Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function
region_model_manager::get_offset_region was failing to make the needed offset
region at all. This was due to the test for a constant 0 pointer that was then
returning get_cast_region. A special case is needed for when PARENT is of type
array_type whose type matches TYPE. In this case, get_offset_region is allowed
to continue to normal conclusion.

The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the
case:

struct s1
{
  char a;
  char b;
  char c;
  char d;
};

struct s2
{
  char arr[4];
};

struct s2 x = {{'A', 'B', 'C', 'D'}};
struct s1 *p = (struct s1 *)
__analyzer_eval (p->a == 'A');
//etc

Re: Patch for PR analyzer/98797

2021-02-17 Thread brian.sobulefsky via Gcc-patches
Hi David,

I'm sorry but I wanted to get this out to you. To clarify, I had a statement of 
the form:

end_offset <= covering_field->field_decl.bit_offset->int_cst.val[0]
covering_field->decl_common.size->int_cst.val[0] - 1;

(Sorry if my email is clobbering the angle brackets). I have replaced the first
expression with int_bit_position (covering_field), I am not sure where to 
properly
access the size of the field. FWIW, I found your region::get_byte_size, which
calls int_size_in_bytes, but this crashes gcc for a field tree, it wants a
type tree.

Additionally, is there some proper way to access a bit_offset_t other than
bit_off.get_val ()[0]? That is what I am using now, but I can swap it out.

Sorry for the newbie questions, but these things aren't really documented
in one place, at least that I am aware of.


Brian


Re: Patch for PR analyzer/98797

2021-02-16 Thread brian.sobulefsky via Gcc-patches
David,

Two items I needs answered to get you the patch.

1) The get_field_at_bit_offset function is static in region.cc, so I cannot use 
it outside of the file (you want me to use it in store.cc) without updating 
that definition to extern. I am assuming you want this.

2) I am updating my direct access of the tree fields. I am using 
int_bit_position for the position of a field in a struct as you requested. What 
do I use for the size of the struct. I had been using 
-decl_common.size-int_cst[0].


Brian


Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Saturday, February 13, 2021 9:53 AM, brian.sobulefsky 
 wrote:

 Hi, answers below. Note, do you want my resubmitted patch to be with
 commit --amend, and therefore relative to "real" commits in the history,
 or do you want me to chain it onto the last submission?

 I have already sent to assign at gnu for the form.

 I will update the commit message to call it s3 now.

 The case did not arise "in practice." After solving the xfail from the
 test suite, I tried dividing the array to have more than one field and
 found that this still did not work, so I tracked down why. I would
 argue that the case of a struct made of only a single array as its one
 field is a lot less likely to arise in practice. For what its worth,
 the single field example basically works "for the wrong
 reason."
 When get_binding_recursive goes up the parent chain, it finds
 a binding
 in the parent cast region because that cast is a struct made of
 only one
 field, and so its binding_key is the same as that field. The
 "right"
 logic would check for a field covering the desired region.

 I'll fix the ChangeLog formatting parentheses. I found the formatting
 script, but I did not have one of the imported python files so I just 
copied others.

 I can add the helper routine you have requested. Where would you like it,
 I am thinking analyzer.h so it is visible basically everywhere?

 I can add the check for size matching, which helper routine is easiest.

 Yes, in the event of *p == 'A', *p, having a constant zerop offset
 would get sent to get_cast_region, where it would become a cast region.
 This defeats the new logic in maybe_fold_sub_svalue. I think in the case
 where we are pointing to an array of the requested type, it is much more
 correct to return the offset region with a zero offset, as arr[0] should 
not
 be different than arr[1]. Sometimes the 0 tree would be of pointer_type,
 and I was not sure if this would defeat it or not, so I made sure it was
 of integer_type. This may just be a matter of my being new and not
 knowing for sure how everything works and so erring on the side of safety.

 As of now, the routine immediately rejects any case other than where
 reg is a field_region and parent is a cast_region. I will think if there
 is a C like syntax for the function, really it is checking if the original
 form of parent had a field covering the requested field.

 I will remove the first guard, leave the second, and try to reformat
 the latters into a similar rejection style.

 Currently, get_field_at_bit_offset is not an externally visible function.
 I am taking your instruction to reuse it to mean a prototype should
 be added to the relevant header (it would be region.h I think, as long as
 both region-model-manager.cc and store.cc include those).

 As you know, I am very new to gcc and so was happy when I could hack my
 way to this working at all. I already assumed my direct access was not
 correct. Most of what I did is based on "RTFS" since the manual does not
 really cover the right way to do these things. I will try the routine
 or otherwise macro you say.

 I will change the testfile to leave pre exisiting lines in place.

 Sent with ProtonMail Secure Email.

 ‐‐‐ Original Message ‐‐‐
 On Friday, February 12, 2021 4:16 PM, David Malcolm dmalc...@redhat.com 
wrote:

  On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote:
  Hi Brian
  Thanks for the patch.
  The patch is large enough to count as legally significant; I've sent
  you information on copyright assignment separately; the patch can't be
  committed until that paperwork is in place.
  In the meantime, I've added some review notes inline below throughout:
 
   Address the bug found in test file 
gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
   recorded by the XFAIL, and some simpler and more complex 
versions found during
   the investigation of it. PR analyzer/98797 was opened for 
these bugs.
  
   The simplest manifestation of this bug can be replicated 
with:
  
   char arr[] = {'A', 'B', 'C', 'D'};
   char *pt = ar;
   __analyzer_eval(*(pt + 0) == 'A');
   __analyzer_eval(*(pt + 1) == 'B');
   //etc
  
   The result of the eval call is "UNKNOWN" because the 
analyzer is unable to
   determine the value at the pointer. Note that array access 
(such as arr[0]) is
   correctly handled. The code responsible for this is in 

Re: Patch for PR analyzer/98797

2021-02-13 Thread brian.sobulefsky via Gcc-patches
Hi, answers below. Note, do you want my resubmitted patch to be with
commit --amend, and therefore relative to "real" commits in the history,
or do you want me to chain it onto the last submission?

I have already sent to assign at gnu for the form.

I will update the commit message to call it s3 now.

The case did not arise "in practice." After solving the xfail from the
test suite, I tried dividing the array to have more than one field and
found that this still did not work, so I tracked down why. I would
argue that the case of a struct made of only a single array as its one
field is a lot less likely to arise in practice. For what its worth,
the single field example basically works "for the wrong
reason."
When get_binding_recursive goes up the parent chain, it finds
a binding
in the parent cast region because that cast is a struct made of
only one
field, and so its binding_key is the same as that field. The
"right"
logic would check for a field covering the desired region.

I'll fix the ChangeLog formatting parentheses. I found the formatting
script, but I did not have one of the imported python files so I just copied 
others.

I can add the helper routine you have requested. Where would you like it,
I am thinking analyzer.h so it is visible basically everywhere?

I can add the check for size matching, which helper routine is easiest.

Yes, in the event of *p == 'A', *p, having a constant zerop offset
would get sent to get_cast_region, where it would become a cast region.
This defeats the new logic in maybe_fold_sub_svalue. I think in the case
where we are pointing to an array of the requested type, it is much more
correct to return the offset region with a zero offset, as arr[0] should not
be different than arr[1]. Sometimes the 0 tree would be of pointer_type,
and I was not sure if this would defeat it or not, so I made sure it was
of integer_type. This may just be a matter of my being new and not
knowing for sure how everything works and so erring on the side of safety.

As of now, the routine immediately rejects any case other than where
reg is a field_region and parent is a cast_region. I will think if there
is a C like syntax for the function, really it is checking if the original
form of parent had a field covering the requested field.

I will remove the first guard, leave the second, and try to reformat
the latters into a similar rejection style.

Currently, get_field_at_bit_offset is not an externally visible function.
I am taking your instruction to reuse it to mean a prototype should
be added to the relevant header (it would be region.h I think, as long as
both region-model-manager.cc and store.cc include those).

As you know, I am very new to gcc and so was happy when I could hack my
way to this working at all. I already assumed my direct access was not
correct. Most of what I did is based on "RTFS" since the manual does not
really cover the right way to do these things. I will try the routine
or otherwise macro you say.

I will change the testfile to leave pre exisiting lines in place.


Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Friday, February 12, 2021 4:16 PM, David Malcolm  wrote:

> On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote:
>
> Hi Brian
>
> Thanks for the patch.
>
> The patch is large enough to count as legally significant; I've sent
> you information on copyright assignment separately; the patch can't be
> committed until that paperwork is in place.
>
> In the meantime, I've added some review notes inline below throughout:
>
> > Address the bug found in test file 
> > gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
> > recorded by the XFAIL, and some simpler and more complex versions found 
> > during
> > the investigation of it. PR analyzer/98797 was opened for these bugs.
> >
> > The simplest manifestation of this bug can be replicated with:
> >
> > char arr[] = {'A', 'B', 'C', 'D'};
> > char *pt = ar;
> > __analyzer_eval(*(pt + 0) == 'A');
> > __analyzer_eval(*(pt + 1) == 'B');
> > //etc
> >
> > The result of the eval call is "UNKNOWN" because the analyzer is unable 
> > to
> > determine the value at the pointer. Note that array access (such as 
> > arr[0]) is
> > correctly handled. The code responsible for this is in file
> > region-model-manager.cc, function 
> > region_model_manager::maybe_fold_sub_svalue.
> > The relevant section is commented /* Handle getting individual chars 
> > from
> > STRING_CST */. This section only had a case for an element_region. A 
> > case
> > needed to be added for an offset_region.
> >
> > Additionally, when the offset was 0, such as in *pt or *(pt + 0), the 
> > function
> > region_model_manager::get_offset_region was failing to make the needed 
> > offset
> > region at all. This was due to the test for a constant 0 pointer that 
> > was then
> > returning get_cast_region. A special case is needed for when PARENT is 
> > of 

Re: Patch for PR analyzer/98797

2021-02-10 Thread brian.sobulefsky via Gcc-patches
I apologize. I left a dangling space in the patch. I am guessing
that this is against whitespace policy. I had actually corrected
it but forgot to add the file to the commit. Please see the
attached.commit a9b60f36b499463b49095f28ce8053f6387c506d
Author: Brian Sobulefsky 
Date:   Tue Feb 9 16:25:43 2021 -0800

Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
recorded by the XFAIL, and some simpler and more complex versions found during
the investigation of it. PR analyzer/98797 was opened for these bugs.

The simplest manifestation of this bug can be replicated with:

char arr[] = {'A', 'B', 'C', 'D'};
char *pt = ar;
__analyzer_eval(*(pt + 0) == 'A');
__analyzer_eval(*(pt + 1) == 'B');
//etc

The result of the eval call is "UNKNOWN" because the analyzer is unable to
determine the value at the pointer. Note that array access (such as arr[0]) is
correctly handled. The code responsible for this is in file
region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue.
The relevant section is commented /* Handle getting individual chars from
STRING_CST */. This section only had a case for an element_region. A case
needed to be added for an offset_region.

Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function
region_model_manager::get_offset_region was failing to make the needed offset
region at all. This was due to the test for a constant 0 pointer that was then
returning get_cast_region. A special case is needed for when PARENT is of type
array_type whose type matches TYPE. In this case, get_offset_region is allowed
to continue to normal conclusion.

The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the
case:

struct s1
{
  char a;
  char b;
  char c;
  char d;
};

struct s2
{
  char arr[4];
};

struct s2 x = {{'A', 'B', 'C', 'D'}};
struct s1 *p = (struct s1 *)
__analyzer_eval (p->a == 'A');
//etc

This requires a case added to region_model_manager::maybe_fold_sub_svalue in
the individual characters from string constant section for a field region.

Finally, the prior only works for the case where struct s2 was a single field
struct. The more general case is:

struct s2
{
  char arr1[2];
  char arr2[2];
};

struct s2 x = {{'A', 'B'}, {'C', 'D'}};

Here the array will never be found in the get_any_binding routines. A new
routine is added to class binding_cluster that checks if the region being
searched is a field_region of a cast_region, and if so, tries to find a field
of the original region that contains the region under examination. This new
function is named binding_cluster::get_parent_cast_binding. It is called from
get_binding_recursive.

gcc/analyzer/ChangeLog:
PR analyzer/98797
* region-model-manager.cc region_model_manager::get_offset_region: Add
check for a PARENT array whose type matches TYPE, and have this skip
returning get_cast_region and rather conclude the function normally.
* region-model-manager.cc region_model_manager::maybe_fold_sub_svalue
Update the get character from string_cst section to include cases for
an offset_region and a field_region.
* store.cc binding_cluster::get_binding_recursive: Add case for call
to new function get_parent_cast_binding.
* store.cc binding_cluster::get_parent_cast_binding: New function.
* store.h class binding_cluster: Add declaration for new member
function get_parent_class_binding and macros for bit to byte
conversion.

gcc/testsuite/ChangeLog:
PR analyzer/98797
* gcc.dg/analyzer/casts-1.c: Update file to no longer expect failures
and add test cases for additional bugs solved.

diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc
index dfd2413e914..1fd6b94f20a 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -602,16 +602,46 @@ region_model_manager::maybe_fold_sub_svalue (tree type,
   /* Handle getting individual chars from a STRING_CST.  */
   if (tree cst = parent_svalue->maybe_get_constant ())
 if (TREE_CODE (cst) == STRING_CST)
-  if (const element_region *element_reg
+  {
+	if (const element_region *element_reg
 	= subregion->dyn_cast_element_region ())
-	{
-	  const svalue *idx_sval = element_reg->get_index ();
-	  if (tree cst_idx = idx_sval->maybe_get_constant ())
-	if (const svalue *char_sval
-		= maybe_get_char_from_string_cst (cst, cst_idx))
-	  return get_or_create_cast (type, char_sval);
-	}
-
+	  {
+	const svalue *idx_sval = element_reg->get_index ();
+	if (tree 

Patch for PR analyzer/98797

2021-02-09 Thread brian.sobulefsky via Gcc-patches
The attached patch has been bootstrapped and regression tested. It addresses PR 
analyzer/98797, which is built off the expected failures in 
gcc/testsuite/gcc.dg/analyzer/casts-1.c It fixes those failures and additional 
manifestations. That testsuite file has been updated to no longer expect 
failures and to include additional tests addressing the other manifestations I 
found.

Some additional work can probably be done to handle further cases, but this is 
now substantially more robust than the current status.

Thanks.commit f3654c5a322f241c8cc551f88257a495689ed9d9
Author: Brian Sobulefsky 
Date:   Tue Feb 9 16:25:43 2021 -0800

Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, 
as
recorded by the XFAIL, and some simpler and more complex versions found 
during
the investigation of it. PR analyzer/98797 was opened for these bugs.

The simplest manifestation of this bug can be replicated with:

char arr[] = {'A', 'B', 'C', 'D'};
char *pt = ar;
__analyzer_eval(*(pt + 0) == 'A');
__analyzer_eval(*(pt + 1) == 'B');
//etc

The result of the eval call is "UNKNOWN" because the analyzer is unable to
determine the value at the pointer. Note that array access (such as arr[0]) 
is
correctly handled. The code responsible for this is in file
region-model-manager.cc, function 
region_model_manager::maybe_fold_sub_svalue.
The relevant section is commented /* Handle getting individual chars from
STRING_CST */. This section only had a case for an element_region. A case
needed to be added for an offset_region.

Additionally, when the offset was 0, such as in *pt or *(pt + 0), the 
function
region_model_manager::get_offset_region was failing to make the needed 
offset
region at all. This was due to the test for a constant 0 pointer that was 
then
returning get_cast_region. A special case is needed for when PARENT is of 
type
array_type whose type matches TYPE. In this case, get_offset_region is 
allowed
to continue to normal conclusion.

The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for 
the
case:

struct s1
{
  char a;
  char b;
  char c;
  char d;
};

struct s2
{
  char arr[4];
};

struct s2 x = {{'A', 'B', 'C', 'D'}};
struct s1 *p = (struct s1 *)
__analyzer_eval (p->a == 'A');
//etc

This requires a case added to region_model_manager::maybe_fold_sub_svalue in
the individual characters from string constant section for a field region.

Finally, the prior only works for the case where struct s2 was a single 
field
struct. The more general case is:

struct s2
{
  char arr1[2];
  char arr2[2];
};

struct s2 x = {{'A', 'B'}, {'C', 'D'}};

Here the array will never be found in the get_any_binding routines. A new
routine is added to class binding_cluster that checks if the region being
searched is a field_region of a cast_region, and if so, tries to find a 
field
of the original region that contains the region under examination. This new
function is named binding_cluster::get_parent_cast_binding. It is called 
from
get_binding_recursive.

gcc/analyzer/ChangeLog:
PR analyzer/98797
* region-model-manager.cc region_model_manager::get_offset_region: 
Add
check for a PARENT array whose type matches TYPE, and have this skip
returning get_cast_region and rather conclude the function normally.
* region-model-manager.cc 
region_model_manager::maybe_fold_sub_svalue
Update the get character from string_cst section to include cases 
for
an offset_region and a field_region.
* store.cc binding_cluster::get_binding_recursive: Add case for call
to new function get_parent_cast_binding.
* store.cc binding_cluster::get_parent_cast_binding: New function.
* store.h class binding_cluster: Add declaration for new member
function get_parent_class_binding and macros for bit to byte
conversion.

gcc/testsuite/ChangeLog:
PR analyzer/98797
* gcc.dg/analyzer/casts-1.c: Update file to no longer expect 
failures
and add test cases for additional bugs solved.

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index dfd2413e914..9aee4c498c6 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -602,16 +602,46 @@ region_model_manager::maybe_fold_sub_svalue (tree type,
   /* Handle getting individual chars from a STRING_CST.  */
   if (tree cst = parent_svalue->maybe_get_constant ())
 if (TREE_CODE (cst) == STRING_CST)
-  if (const element_region *element_reg
+  {
+   if (const element_region *element_reg
=