On Fri, May 31, 2024 at 4:06 PM Marta Rybczynska <[email protected]>
wrote:

> Hello,
> We have been working on the standalone version of the cve-check that can
> run over an SBOM (SPDX+VEX-like information) since April. This is the
> moment for the RFC. This way we can verify use cases, discuss what is
> needed to add and do testing early! (and the curious people can start
> running it too)
>
> The tooling consists of a patchset over OE-core:
> https://lists.openembedded.org/g/openembedded-core/message/200074
> https://lists.openembedded.org/g/openembedded-core/message/200075
> and a modification heeded for the CVE backend (the Kernel CNA uses "linux"
> and not "linux_kernel")
> https://lists.openembedded.org/g/openembedded-core/message/200076
>
> The standalone tool RFC is here:
> https://gitlab.com/syslinbit/public/yocto-vex-check This repo will
> migrate to the YP infra, likely in a form of multiple ones (we also have
> autobuilder changes, and will likely split binary data to a separate
> repository).
>
> The yocto-vex-check supports two databases today: NVD and CVE.
>
> How to use the tool (from the README)
>
> This tool allows to check SBOMs for vulnerabilities. You can take
> SBOM/VEX of a Yocto Project build and generate a fresh vulnerability
> status. It uses the SPDX archive as the list of packages to check, with the
> additional information from the VEX class.
>
> WARNING: this tool is under active development, file formats and options
> do change regularly.
> Functionalities:
>  - reads YP SPDX 2.2 files and YP cve-check JSON files
>  - converts SPDX/cve-check JSON to VEX (close to OpenVEX)
>  - performs a check for vulnerabilities taking the VEX and using the
> specified database
>  - uses output of the "vex" class of OpenEmbedded-core
>
>  How to use:
>  1. Download the database to use:
>  - for NVD, use cve-update-nvd2-native.py
>  - for the CVE database, get the CVEv5 git repository: either the upsteam
> one at https://github.com/CVEProject/cvelistV5 or the one with OE-related
> fixes at https://github.com/mrybczyn/cvelistV5-overrides (recommended)
>
> 2. Enable the vex class in your YP-based distribution build by adding to
> your local.conf:
> INHERIT += "vex"
>
> 3. Build your image as usual and note the location of the CVE JSON file
> (in the output of the build)
>
> 4. (Recommended for production builds) archive the CVE JSON file and the
> SPDX archive
>
>  5. Use the tool as described below at any moment to generate a fresh
> vulnerability status
>
>  6. (Optional) You can convert the output to text or CSV format using
> scripts/cve-report.py
>
> Descrition of tools:
>
> The high-level wrapper performs the whole process of a given database. The
> usage is as follows:
>
>  ./wrap-yocto-vex-check.py -i <cve-check-JSON-file> -o <output_dir> -t
> <temporary_dir> -b <build directory like tmp-glibc> -db <database path>
> -db-type <database type>
>
> where:
> -i <cve-check-JSON-file> is the CVE JSON file as given in the output of
> the build
> -o <output_dir> is the output directory for the updated CVE file
> -t <temporary_dir> is a temporary directory used during the scan
> -b <build directory like tmp-glibc> is the build directory that contains
> the SPDX archive
> -db <database path> is the path to the database file (NVD) or downloaded
> git repository (CVE)
> -db-type <database type> is the database type: NVD or CVE
>
>  Lower-level tools:
>
> - cve-update-nvd2-native.py - allows to download the NVD database
> - yocto-vex-check.py - step-by-step conversions; imports SPDX archive,
> allows to convert SPDX+CVE JSON to VEX, runs the vulnerability check on the
> given database
>
>  Limitations:
> - today the SPDX generation in YP works for images and SDK, not for the
> world build
> - the MITRE (CVE) database contains a number of difficult-to-parse
> entries; especially before 2024.
>
> We know there are some bugs (one related to the linux_kernel), the fix is
> in progress.
>
> Questions/comments/use-cases welcome!
>
> Our questions for further steps:
> 1. We started with the assumption of SPDX as the package list to scan and
> the CVE JSON data as the supplementary data needed for the scan. However,
> SPDX archive isn't generated for world builds. We can 1) add the only
> remaining information to the VEX (CVE-JSON) which is the CPE list and allow
> to use the VEX file as source and 2) add the remaining archives to SPDX.
>
> At the end we will likely want both, also because SPDX2.2 requires the
> complete build, and that is long in case of world.
>
> 2. The tool currently supports either one backend (CVE, NVD), or the
> other. Results are different. We can merge results by post-processing or
> add a list of databases to use. In both cases, there is a difficulty to
> decide on priority of conflicting scan results.
>
> 3. We'd like to have unit tests using binary data (static NVD database,
> specially choosen CVE entries etc). It makes sense to keep them in a
> separate repository because of the size.
>
> 4. It turned out to be complicated to find locate variables needed to
> pass to the tool directly from bitbake. CVE check and SPDX are running at
> different phases, and there would be a need for changes in task ordering
> and possibly some inheritance. This is complicated and bug prone, so the
> question is: what about using the tool separately from the build? We can
> generate configuration files as the list of options increases
>
> What do you think?
>


Hello all,
I will summarize the discussion during the last weekly call (unfortunatelly
nobody took notes!):
Richard would prefer to avoid adding new tasks and ideally keep the needed
data in a place where anyone can take it. He was thinking about sstate.

I've been thinking about it for the last few days and here is what my
results are:
1. The additional tasks are there to generate the data fast. We could do
that in do_fetch, but do_fetch causes downloads of everything. On a world
build, this is long. We can decide to drop the use-case of supporting a
fact check for world. I think that would be a pity, I have been using this
configuration quite a lot for checking complete layers. Waiting for a
"world" download of a 20-layer setup is time-consuming... Especially in a
CI that just does this check and has no other need for a shared DL_DIR.

Do, I would rather suggest adding a new, generic task run before do_fetch.
SPDX can start grabbing information at this stage too.

2. For a check of a given build configuration, we need a package list and
have it exported so that it can be used later. Obviously we can't use "all
that is in sstate", because it will contain different builds too.

3. How do we store the data in longer-term? Sstate is useful, but in my
experience I have never considered is as something that should be kept as a
part of a release. Instead, I have it on build machines, and prune/remove
if something looks fishy (I have had multiple issues with sharing sstate
between releases, so gave up)... Not sure how others handle it. From my
perspective, we need to export the data at the release time so that it can
be reused safely later. On the other hand, getting data from sstate
requires re-calculating hashes to make sure you get the right archive, and
that requires information from the build.

To re-generate data easily after-build, we would need to export it at the
build end. Pre-calculated data can be useful for various purposes, but we
need an export to a place to store the data (ideally a single file per
build).

What customers start requiring now is: upload the SBOM and VEX for
composition analysis, usually to their properitary platform.

What do you think?

Regards,
Marta
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#200446): 
https://lists.openembedded.org/g/openembedded-core/message/200446
Mute This Topic: https://lists.openembedded.org/mt/106408578/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to