[PATCH] D155204: [clang][dataflow] Add `refreshStructValue()`.

2023-07-17 Thread Gábor Horváth via Phabricator via cfe-commits
xazax.hun added inline comments.



Comment at: 
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:706-711
+/// This function is primarily intended for use by checks that set custom
+/// properties on `StructValue`s to model the state of these values. Such 
checks
+/// should avoid modifying the properties of an existing `StructValue` because
+/// these changes would be visible to other `Environment`s that share the same
+/// `StructValue`. Instead, call `refreshStructValue()`, then set the 
properties
+/// on the new `StructValue` that it returns. Typical usage:

mboehme wrote:
> xazax.hun wrote:
> > I think this is fine for now, but I wonder if we should come up with an API 
> > where errors like this cannot happen. One potential way would be to no 
> > longer include these properties in the `StructValue`s, but have a separate 
> > mapping `StructValue => Properties`. So, one can update the mapping in an 
> > environment without any unintended consequences in other environments. 
> > I think this is fine for now, but I wonder if we should come up with an API 
> > where errors like this cannot happen.
> 
> Thanks for bringing this up.
> 
> I agree that this needs more improvement in the future. The underlying issue 
> is that `StructValue` isn't really a very useful concept.
> 
> `Value` works great for scalar values, where we do actually treat it as the 
> immutable value that it's intended to be. Attaching properties to values 
> makes a lot of sense in this context: If we're modelling a property of an 
> immutable value, then that property is presumably itself immutable.
> 
> However, the `Value` concept has never worked well for structs / records 
> because, when we mutate fields, we don't update the `StructValue`. We could 
> try to change this, but I don't think this would be worth it because 
> `StructValue` isn't a useful concept in C++ anyway. As the value categories 
> RFC discusses, prvalues of class type are a very niche concept in C++. The 
> only thing you can do with them is to initialize a result object -- you can't 
> otherwise perform any operations on them (i.e. access member variables or 
> call member functions). This has been part of the motivation for reducing 
> their importance as part of the value categories work.
> 
> I think we should continue down that path and, ultimately, maybe eliminate 
> `StructValue` entirely. So while introducing a mapping `StructValue => 
> Properties` would be a good option, I think an even better one would be to 
> start attaching properties to `AggregateStorageLocation`s instead of 
> `StructValue`s. Because `AggregateStorageLocation`s are in essence, mutable, 
> the mapping of `AggregateStorageLocation`s to property values would need to 
> be reflected in the `Environment` in some form. I think a natural way to do 
> this would be to model properties in a similar way to fields: An 
> `AggregateStorageLocation` would have a mapping `PropertyName => 
> StorageLocation`, and the values of the properties would then be tracked 
> through the existing `StorageLocation => Value` mapping in the environment.
> 
> Benefits:
> 
> - As already noted, this makes properties on `AggregateStorageLocation`s very 
> similar to fields, which is what they are typically used to model
> - If a particular analysis needs a storage location for a property, we 
> already have one. For example, UncheckedOptionalAccessModel.cpp currently 
> models the "value" property as a `PointerValue` so that it has a 
> `StorageLocation` that it can use as the return value of `optional::value()`. 
> Under the approach I'm proposing, the storage location would be available 
> from the framework, and the model for a specific analysis wouldn't have to do 
> anything extra.
> - We don't need any additional logic for joins / widening; the existing join 
> / widening logic for the `StorageLocation => Value` mapping would also handle 
> properties.
> 
> So this is where I think we can go in the future, and I agree that 
> `refreshStructValue()` should only be a stopgap as we evolve the framework.
This sounds great, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155204/new/

https://reviews.llvm.org/D155204

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155204: [clang][dataflow] Add `refreshStructValue()`.

2023-07-17 Thread Martin Böhme via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd17f455a6348: [clang][dataflow] Add `refreshStructValue()`. 
(authored by mboehme).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155204/new/

https://reviews.llvm.org/D155204

Files:
  clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
  clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
  clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp

Index: clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
===
--- clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
+++ clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
@@ -370,13 +370,6 @@
   // FIXME: Called functions at point `p` should contain only "foo".
 }
 
-StructValue &createNewStructValue(AggregateStorageLocation &Loc,
-  Environment &Env) {
-  auto &Val = *cast(Env.createValue(Loc.getType()));
-  Env.setValue(Loc, Val);
-  return Val;
-}
-
 // Models an analysis that uses flow conditions.
 class SpecialBoolAnalysis final
 : public DataflowAnalysis {
@@ -407,7 +400,7 @@
   auto &ObjectLoc =
   *cast(getImplicitObjectLocation(*E, Env));
 
-  createNewStructValue(ObjectLoc, Env)
+  refreshStructValue(ObjectLoc, Env)
   .setProperty("is_set", Env.getBoolLiteralValue(true));
 }
   }
@@ -562,10 +555,7 @@
   auto *Object = E->getArg(0);
   assert(Object != nullptr);
 
-  auto &ObjectLoc = *cast(
-  Env.getStorageLocation(*Object, SkipPast::Reference));
-
-  createNewStructValue(ObjectLoc, Env)
+  refreshStructValue(*Object, Env)
   .setProperty("has_value", Env.getBoolLiteralValue(true));
 }
   }
Index: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
===
--- clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -987,5 +987,30 @@
   return Fields;
 }
 
+StructValue &refreshStructValue(AggregateStorageLocation &Loc,
+Environment &Env) {
+  auto &NewVal = *cast(Env.createValue(Loc.getType()));
+  Env.setValue(Loc, NewVal);
+  return NewVal;
+}
+
+StructValue &refreshStructValue(const Expr &Expr, Environment &Env) {
+  assert(Expr.getType()->isRecordType());
+
+  auto &NewVal = *cast(Env.createValue(Expr.getType()));
+
+  if (Expr.isPRValue()) {
+Env.setValueStrict(Expr, NewVal);
+  } else {
+StorageLocation *Loc = Env.getStorageLocationStrict(Expr);
+if (Loc == nullptr) {
+  Loc = &Env.createStorageLocation(Expr);
+}
+Env.setValue(*Loc, NewVal);
+  }
+
+  return NewVal;
+}
+
 } // namespace dataflow
 } // namespace clang
Index: clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
===
--- clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -389,6 +389,12 @@
   /// necessary storage locations and values for indirections until it finds a
   /// non-pointer/non-reference type.
   ///
+  /// If `Type` is one of the following types, this function will always return
+  /// a non-null pointer:
+  /// - `bool`
+  /// - Any integer type
+  /// - Any class, struct, or union type
+  ///
   /// Requirements:
   ///
   ///  `Type` must not be null.
@@ -692,6 +698,24 @@
 /// order in which they appear in `InitListExpr::inits()`.
 std::vector getFieldsForInitListExpr(const RecordDecl *RD);
 
+/// Associates a new `StructValue` with `Loc` and returns the new value.
+/// It is not defined whether the field values remain the same or not.
+///
+/// This function is primarily intended for use by checks that set custom
+/// properties on `StructValue`s to model the state of these values. Such checks
+/// should avoid modifying the properties of an existing `StructValue` because
+/// these changes would be visible to other `Environment`s that share the same
+/// `StructValue`. Instead, call `refreshStructValue()`, then set the properties
+/// on the new `StructValue` that it returns. Typical usage:
+///
+///   refreshStructValue(Loc, Env).setProperty("my_prop", MyPropValue);
+StructValue &refreshStructValue(AggregateStorageLocation &Loc,
+Environment &Env);
+
+/// Associates a new `StructValue` with `Expr` and returns the new value.
+/// See also documentation for the overload above.
+StructValue &refreshStructValue(const Expr &Expr, Environment &Env);
+
 } // namespace dataflow
 } // namespace clang
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mai

[PATCH] D155204: [clang][dataflow] Add `refreshStructValue()`.

2023-07-17 Thread Martin Böhme via Phabricator via cfe-commits
mboehme added inline comments.



Comment at: 
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:706-711
+/// This function is primarily intended for use by checks that set custom
+/// properties on `StructValue`s to model the state of these values. Such 
checks
+/// should avoid modifying the properties of an existing `StructValue` because
+/// these changes would be visible to other `Environment`s that share the same
+/// `StructValue`. Instead, call `refreshStructValue()`, then set the 
properties
+/// on the new `StructValue` that it returns. Typical usage:

xazax.hun wrote:
> I think this is fine for now, but I wonder if we should come up with an API 
> where errors like this cannot happen. One potential way would be to no longer 
> include these properties in the `StructValue`s, but have a separate mapping 
> `StructValue => Properties`. So, one can update the mapping in an environment 
> without any unintended consequences in other environments. 
> I think this is fine for now, but I wonder if we should come up with an API 
> where errors like this cannot happen.

Thanks for bringing this up.

I agree that this needs more improvement in the future. The underlying issue is 
that `StructValue` isn't really a very useful concept.

`Value` works great for scalar values, where we do actually treat it as the 
immutable value that it's intended to be. Attaching properties to values makes 
a lot of sense in this context: If we're modelling a property of an immutable 
value, then that property is presumably itself immutable.

However, the `Value` concept has never worked well for structs / records 
because, when we mutate fields, we don't update the `StructValue`. We could try 
to change this, but I don't think this would be worth it because `StructValue` 
isn't a useful concept in C++ anyway. As the value categories RFC discusses, 
prvalues of class type are a very niche concept in C++. The only thing you can 
do with them is to initialize a result object -- you can't otherwise perform 
any operations on them (i.e. access member variables or call member functions). 
This has been part of the motivation for reducing their importance as part of 
the value categories work.

I think we should continue down that path and, ultimately, maybe eliminate 
`StructValue` entirely. So while introducing a mapping `StructValue => 
Properties` would be a good option, I think an even better one would be to 
start attaching properties to `AggregateStorageLocation`s instead of 
`StructValue`s. Because `AggregateStorageLocation`s are in essence, mutable, 
the mapping of `AggregateStorageLocation`s to property values would need to be 
reflected in the `Environment` in some form. I think a natural way to do this 
would be to model properties in a similar way to fields: An 
`AggregateStorageLocation` would have a mapping `PropertyName => 
StorageLocation`, and the values of the properties would then be tracked 
through the existing `StorageLocation => Value` mapping in the environment.

Benefits:

- As already noted, this makes properties on `AggregateStorageLocation`s very 
similar to fields, which is what they are typically used to model
- If a particular analysis needs a storage location for a property, we already 
have one. For example, UncheckedOptionalAccessModel.cpp currently models the 
"value" property as a `PointerValue` so that it has a `StorageLocation` that it 
can use as the return value of `optional::value()`. Under the approach I'm 
proposing, the storage location would be available from the framework, and the 
model for a specific analysis wouldn't have to do anything extra.
- We don't need any additional logic for joins / widening; the existing join / 
widening logic for the `StorageLocation => Value` mapping would also handle 
properties.

So this is where I think we can go in the future, and I agree that 
`refreshStructValue()` should only be a stopgap as we evolve the framework.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155204/new/

https://reviews.llvm.org/D155204

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155204: [clang][dataflow] Add `refreshStructValue()`.

2023-07-15 Thread Gábor Horváth via Phabricator via cfe-commits
xazax.hun accepted this revision.
xazax.hun added inline comments.



Comment at: 
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:706-711
+/// This function is primarily intended for use by checks that set custom
+/// properties on `StructValue`s to model the state of these values. Such 
checks
+/// should avoid modifying the properties of an existing `StructValue` because
+/// these changes would be visible to other `Environment`s that share the same
+/// `StructValue`. Instead, call `refreshStructValue()`, then set the 
properties
+/// on the new `StructValue` that it returns. Typical usage:

I think this is fine for now, but I wonder if we should come up with an API 
where errors like this cannot happen. One potential way would be to no longer 
include these properties in the `StructValue`s, but have a separate mapping 
`StructValue => Properties`. So, one can update the mapping in an environment 
without any unintended consequences in other environments. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155204/new/

https://reviews.llvm.org/D155204

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155204: [clang][dataflow] Add `refreshStructValue()`.

2023-07-13 Thread Martin Böhme via Phabricator via cfe-commits
mboehme created this revision.
Herald added subscribers: martong, xazax.hun.
Herald added a reviewer: NoQ.
Herald added a project: All.
mboehme requested review of this revision.
Herald added subscribers: cfe-commits, wangpc.
Herald added a project: clang.

Besides being a useful abstraction, this function will help insulate existing
clients of the framework from upcoming changes to the API of `StructValue`
and `AggregateStorageLocation`.

Depends On D155202 


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155204

Files:
  clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
  clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
  clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp

Index: clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
===
--- clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
+++ clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
@@ -370,13 +370,6 @@
   // FIXME: Called functions at point `p` should contain only "foo".
 }
 
-StructValue &createNewStructValue(AggregateStorageLocation &Loc,
-  Environment &Env) {
-  auto &Val = *cast(Env.createValue(Loc.getType()));
-  Env.setValue(Loc, Val);
-  return Val;
-}
-
 // Models an analysis that uses flow conditions.
 class SpecialBoolAnalysis final
 : public DataflowAnalysis {
@@ -407,7 +400,7 @@
   auto &ObjectLoc =
   *cast(getImplicitObjectLocation(*E, Env));
 
-  createNewStructValue(ObjectLoc, Env)
+  refreshStructValue(ObjectLoc, Env)
   .setProperty("is_set", Env.getBoolLiteralValue(true));
 }
   }
@@ -561,10 +554,7 @@
   auto *Object = E->getArg(0);
   assert(Object != nullptr);
 
-  auto &ObjectLoc = *cast(
-  Env.getStorageLocation(*Object, SkipPast::Reference));
-
-  createNewStructValue(ObjectLoc, Env)
+  refreshStructValue(*Object, Env)
   .setProperty("has_value", Env.getBoolLiteralValue(true));
 }
   }
Index: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
===
--- clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -993,5 +993,30 @@
   return Fields;
 }
 
+StructValue &refreshStructValue(AggregateStorageLocation &Loc,
+Environment &Env) {
+  auto &NewVal = *cast(Env.createValue(Loc.getType()));
+  Env.setValue(Loc, NewVal);
+  return NewVal;
+}
+
+StructValue &refreshStructValue(const Expr &Expr, Environment &Env) {
+  assert(Expr.getType()->isRecordType());
+
+  auto &NewVal = *cast(Env.createValue(Expr.getType()));
+
+  if (Expr.isPRValue()) {
+Env.setValueStrict(Expr, NewVal);
+  } else {
+StorageLocation *Loc = Env.getStorageLocationStrict(Expr);
+if (Loc == nullptr) {
+  Loc = &Env.createStorageLocation(Expr);
+}
+Env.setValue(*Loc, NewVal);
+  }
+
+  return NewVal;
+}
+
 } // namespace dataflow
 } // namespace clang
Index: clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
===
--- clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -389,6 +389,12 @@
   /// necessary storage locations and values for indirections until it finds a
   /// non-pointer/non-reference type.
   ///
+  /// If `Type` is one of the following types, this function will always return
+  /// a non-null pointer:
+  /// - `bool`
+  /// - Any integer type
+  /// - Any class, struct, or union type
+  ///
   /// Requirements:
   ///
   ///  `Type` must not be null.
@@ -694,6 +700,24 @@
 /// order in which they appear in `InitListExpr::inits()`.
 std::vector getFieldsForInitListExpr(const RecordDecl *RD);
 
+/// Associates a new `StructValue` with `Loc` and returns the new value.
+/// It is not defined whether the field values remain the same or not.
+///
+/// This function is primarily intended for use by checks that set custom
+/// properties on `StructValue`s to model the state of these values. Such checks
+/// should avoid modifying the properties of an existing `StructValue` because
+/// these changes would be visible to other `Environment`s that share the same
+/// `StructValue`. Instead, call `refreshStructValue()`, then set the properties
+/// on the new `StructValue` that it returns. Typical usage:
+///
+///   refreshStructValue(Loc, Env).setProperty("my_prop", MyPropValue);
+StructValue &refreshStructValue(AggregateStorageLocation &Loc,
+Environment &Env);
+
+/// Associates a new `StructValue` with `Expr` and returns the new value.
+/// See also documentation for the overload above.
+StructValue &refreshStructValue(const Expr &Expr, Envir