arm: justify or initialize conditionally uninitialized variables

Julien Grall Thu, 20 Jul 2023 08:27:50 -0700

Hi Nicola,

On 20/07/2023 11:14, Nicola Vetrini wrote:

On 17/07/23 15:40, Julien Grall wrote:
Hi Nicola,

On 17/07/2023 13:08, Nicola Vetrini wrote:
On 14/07/23 15:00, Julien Grall wrote:
Hi Nicola,

On 14/07/2023 12:49, Nicola Vetrini wrote:
This patch aims to fix some occurrences of possibly uninitialized
variables, that may be read before being written. This behaviour would
violate MISRA C:2012 Rule 9.1, besides being generally undesirable.

In all the analyzed cases, such accesses were actually safe, but it's
quite difficult to prove so by automatic checking, therefore a safer
route is to change the code so as to avoid the behaviour fromoccurring,
while preserving the semantics.

To achieve this goal, I adopted the following strategies:
Please let's at least one patch per strategy. I would also considersome of the rework separate so they can go in regardless thedecision for the SAF-*.
- Add a suitably formatted local deviation comment
   (as indicated in 'docs/misra/documenting-violations.rst')
   to exempt the following line from checking.

- Provide an initialization for the variable at the declaration.
- Substitute a goto breaking out of control flow logic with asemantically
   equivalent do { .. } while(0).
As I already mentioned in private, it is unclear to me how youdecided which strategy to use. I still think we need to define ourpolicy before changing the code. Otherwise, it is going to bedifficult to decide for new code.
The main point of this RFC is doing so. From what I gathered, it'snot an easy task: sometimes there are no 'safe' values to initializevariables to and sometimes there is no easy way to prove that indeedsomething is always initialized or not accessed at all.
But you wrote the code. So you should be able to explain how you tookthe decision between one and the others.
Also, even if this is an RFC, it would have been good to summarize anydiscussion that happened in private and if there were concern try tocome up with ideas or at least listing the concerns after '---.
I'll keep this if the need arises in the future.
Signed-off-by: Nicola Vetrini <nicola.vetr...@bugseng.com>
---
  docs/misra/safe.json                   |  8 +++++++
  xen/arch/arm/arm64/lib/find_next_bit.c |  1 +
  xen/arch/arm/bootfdt.c                 |  6 +++++
  xen/arch/arm/decode.c                  |  2 ++
  xen/arch/arm/domain_build.c            | 29 ++++++++++++++++++----
  xen/arch/arm/efi/efi-boot.h            |  6 +++--
  xen/arch/arm/gic-v3-its.c              |  9 ++++---
  xen/arch/arm/mm.c                      |  1 +
xen/arch/arm/p2m.c | 33+++++++++++++++-----------
  9 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/docs/misra/safe.json b/docs/misra/safe.json
index e3c8a1d8eb..244001f5be 100644
--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -12,6 +12,14 @@
          },
          {
              "id": "SAF-1-safe",
+            "analyser": {
+                "eclair": "MC3R1.R9.1"
+            },
+            "name": "Rule 9.1: initializer not needed",
+ "text": "The following local variables are possiblysubject to being read before being written, but code inspectionensured that the control flow in the construct where they appearensures that no such event may happen."
I am bit concerned which such statement because the code instancewas today with the current code. This could change in the future andinvalide the reasoning.
It is not clear to me if we have any mechanism to prevent that. Ifwe don't, then I think we need to drastically reduce the number oftime this is used (there are a bit too much for my taste).
Indeed, the purpose of such a deviation is that the soundoverapproximation computed by the tool requires a human to look atthe code and think twice before modifying it (i.e., if ever that codeis touched, the reviewer ought to assess whether that justificationstill holds or some other thing should be done about it.
Your assumption is the reviewer will notice there is an existingdevitation and be able to assess it has changed. I view thisassumption as risky in the long term.
Have you investigate to improve the automatic tooling?
Well, as discussed elsewhere in the thread, a slightly modified versionof this deviation comment can list the specific reason why such a thingwas deviated directly at the declaration or where the caution is, if youthink this is better.
Example:

// <- SAF-x here
int var;

[...]

// <- or HERE
f(&var);
An alternative approach to justification, partly discussed with Stefanoin private is a macro that looks like an attribute to signal that thevariable is intentionally uninitialized. This does not have the benefitof a written justification with a proper comment or an entry in the jsonfile, but is less intrusive and the justification for all occurrences of__uninit w.r.t R9.1 would be included in the static analysis toolconfiguration, which would be part of the MISRA compliancedocumentation. This does imply a coarse justification like the oneabove, but if further clarification is needed it can be provided locallyin the code, as guidance for contributors.
Example:
#define __uninit

__uninit int x;

IHMO none of the example really help. You are still expecting thereviewer/developper to drop __uninit or any comment if the behavior haschanged.

I don't have a good idea how we can mitigate it other than reworking thecode in a way that makes both ECLAIR and the maintainers happy. Maybethe others will.


[...]

      int ret;
@@ -249,8 +252,10 @@ static void __initprocess_multiboot_node(const void *fdt, int node,
      const __be32 *cell;
      bootmodule_kind kind;
      paddr_t start, size;
+    /* SAF-1-safe MC3R1.R9.1 */
      int len;
/* sizeof("/chosen/") + DT_MAX_NAME + '/' + DT_MAX_NAME +'/0' => 92 */
+    /* SAF-1-safe MC3R1.R9.1*/
      char path[92];
So the two above, is one category of issue. The variables are passedas argument of function which will fill them.
Can Eclair look at the callers, if so, can we consider to alwaysinitialize the values in the callee?
This would reduce the number of SAF-*. There are a few otherexamples like that below. So I will skip them for now.
[...]
If the value is always initialized in the callee, then there's noproblem configuring ECLAIR so that it knows that this parameter isalways written, and therefore any subsequent use in the caller is ok.
Another possibility is stating that a function never reads thepointee before writing to it (it may or may not write it, but if itdoesn't, then the pointee is not read either). The 'strncmp' after'fdt_get_path' does get in the way, though, because this property isnot strong enough to ensure that we can use 'path' after returningfrom the function.
I am not sure I fully understand what you wrote. Can you provide a Cexample?
void f(int *x) {
   if(x) {
     *x = 10;
     int y =*x; // read the pointee after it's initialized
   } else {
     int z; // in this branch the pointee is not read nor written
   }
   // we can say that f never reads *x before (possibly) writing to it.
}

I am having trouble to understand it in the context of fdt_get_path().Is 'f' meant to be fdt_get_path()?


[...]

      rc = evtchn_alloc_unbound(&alloc, 0);
      if ( rc )
      {
@@ -3810,8 +3827,9 @@ static int __init construct_domU(structdomain *d,
                                   const struct dt_device_node *node)
  {
      struct kernel_info kinfo = {};
-    const char *dom0less_enhanced;
+    const char *dom0less_enhanced = NULL;
If you look at the user below, all the callers assumedom0less_enhanced will be non-NULL. So it is unclear to me how thisvalue is safer.
 > Looking at the code, I wonder whether we should convert
dt_property_read_string() to use ERR_PTR(). So we could remove thelast argument and return it instead.
Is relying on that assumption somehow safer?
I am assuming you are referring to "If you look at the user below, allthe callers assume dom0less_enhanced will be non-NULL". Note that Ididn't suggest it is safer. I am only pointed out that you didn'tspecify how this was better in the context of the code.
This should be probably discussed after deciding on the refactoring'dt_property_read_string'

FAOD, I think we should refactor dt_property_read_string(). I am happyto write a patch if you want.


[...]

The analysis here could use some more precision, but the modifiedconstruct is entirely equivalent.
I agree that they are equivalent. But in general, we don't change thestyle of the construct without explaining why.
In this case, the first step would be to improve Eclair.
The changes needed for this kind of analysis are not trivial: we'velooked into this, but there's no easy way to support this in a timelymanner. I understand that this is an estabilished pattern, but whatwould you think of an initializer using designators?
uint64_t cmd[4] = {
         .[0] = GITS_CMD_MAPC;
         .[1] = 0x00;
         .[2] = encode_rdbase(its, cpu, collection_id) | GITS_VALID_BIT;
         .[3] = 0x00;
}

The reability is Ok here. But this may not be the case here. Inparticular...

      cmd[3] = 0x00;
      return its_send_command(its, cmd);

@@ -215,9 +214,7 @@ static int its_send_cmd_mapd(struct host_its*its, uint32_t deviceid,

      }
      cmd[0] = GITS_CMD_MAPD | ((uint64_t)deviceid << 32);
      cmd[1] = size_bits;
-    cmd[2] = itt_addr;
-    if ( valid )
-        cmd[2] |= GITS_VALID_BIT;
+    cmd[2] = itt_addr | (valid ? GITS_VALID_BIT : 0x00);


Same here.


here... I much prefer the existing version.

[...]

+      if ( nr == MAX_VMID )
+      {
+          rc = -EBUSY;
+ printk(XENLOG_ERR "p2m.c: dom%d: VMID pool exhausted\n",d->domain_id);
+          break;
+      }
-    set_bit(nr, vmid_mask);
+      set_bit(nr, vmid_mask);
-    p2m->vmid = nr;
+      p2m->vmid = nr;
-    rc = 0;
+            rc = 0;
+        } while ( 0 );
-out:
      spin_unlock(&vmid_alloc_lock);
      return rc;
  }
Considering all of the replies above, a first draft of astrategy/policy I can think of is having:
- Initializer functions that always write their parameter, so thatthe strongest "pointee always written" property can be stated. Thiscauses all further uses to be marked safe.
- Initialize the variable when there exists a known safe value thatdoes not alter the semantics of the function. The initialization doesnot need to be at the declaration, but doing so simplifies the code.
As I mentionned in private there are two risks with that:
  1. You silence compiler to spot other issues
2. You may now get warning from Coverity if it spots you set a valuethat get overwritten before its first use.
So I think such approach should be used with parcimony. Instead, weshould look at reworking the code when possible.
Do you think it would help if you look directly at actual cautions tospot possible functions that can be refactored?

I have already looked at some. Can we focus on them and see how much ithelps?


Cheers,

--
Julien Grall

Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

Reply via email to