On Wed, 2023-08-09 at 15:22 -0400, Eric Feng wrote:
> Thank you for your help in getting dg-require-python-h working! I can
> confirm that the FAILs are related to differences between the --
> cflags
> affecting the gimple seen by the analyzer. For this reason, I have
> changed it to --includes for now.
Sounds good.
Eventually we'll probably want to support --cflags, but given that
every distribution probably has its own set of flags, it's a recipe for
an unpleasantly large test matrix, so just using --includes is a good
compromise.
> To be sure, I tested on Python 3.8 as
> well and it works as expected. I have also addressed the following
> comments on the WIP patch as you described.
>
> -- Update Changelog entry to list new functions being simulated.
> -- Update region_model::get_or_create_region_for_heap_alloc leading
> comment.
> -- Change register_alloc to update_state_machine.
> -- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
> -- Static helper functions for:
> -- Initializing ob_refcnt field.
> -- Setting ob_type field.
> -- Getting ob_base field.
> -- Initializing heap allocated region for PyObjects.
> -- Incrementing a field by one.
> -- Change arg_is_long_p to arg_is_integral_p.
> -- Extract common failure scenario for reusability.
>
> The initial WIP patch using
>
> /* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */
>
> have been bootstrapped and regtested on aarch64-unknown-linux-gnu.
> Since
> we did not change any core logic in the revision and the only changes
> within the analyzer core are changing variable names, is it OK for
> trunk. In the mean time, the revised patch is currently going through
> bootstrap and regtest process.
Thanks for the updated patch.
Unfortunately I just pushed a largish analyzer patch (r14-3114-
g73da34a538ddc2) which may well conflict with your patch, so please
rebase to beyond that.
Sorry about this.
In particular note that there's no longer a default assignment to the
LHS at a call-site in region_model::on_call_pre; known_function
subclasses are now responsible for assigning to the LHS of the
callsite. But I suspect that all the known_function subclasses in the
cpython plugin already do that.
Some nits inline below...
[...snip...]
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with
> the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every
> time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual
> implementation
> follows.
Might be worth capturing these notes as comments in the source (for
class kf_PyList_Append), rather than just within the commit message.
[...snip...]
>
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index e92b3f7b074..c338f045d92 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats
> (const svalue *size_in_bytes,
> Use CTXT to complain about tainted sizes.
>
> Reuse an existing heap_allocated_region if it's not being
> referenced by
> - this region_model; otherwise create a new one. */
> + this region_model; otherwise create a new one.
> +
> + Optionally (update_state_machine) transitions the pointer
> pointing to the
> + heap_allocated_region from start to assumed non-null. */
>
> const region *
> region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
> -
> region_model_context *ctxt)
> + region_model_context *ctxt,
> + bool update_state_machine,
> + const call_details *cd)
> {
> /* Determine which regions are referenced in this region_model, so
> that
> we can reuse an existing heap_allocated_region if it's not in
> use on
> @@ -5153,6 +5158,17 @@
> region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
> if (size_in_bytes)
> if (compat_types_p (size_in_bytes->get_type (), size_type_node))
> set_dynamic_extents (reg, size_in_bytes, ctxt);
> +
> + if (update_state_machine && cd)
> + {
> + const svalue *ptr_sval = nullptr;
> + if (cd->get_lhs_type ())
> + ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
> + else
> + ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
> + transition_ptr_sval_non_null (ctxt,
> ptr_sval);
This if/else is redundant: the "else" is only reached if cd-
>get_lhs_type () is null, in which case you pass in NULL_TREE, so it
works the same either way. Or am I missing something?
Also, it looks like something weird's happening with