Re: [OE-core] [PoC 0/1] LLM enriched CVE entries

Paul Barker Fri, 12 Dec 2025 07:37:51 -0800

On Fri, 2025-12-12 at 13:40 +0100, Quentin Schulz via
lists.openembedded.org wrote:
> Hi Gyorgy,
> 
> On 12/5/25 10:16 AM, Gyorgy Sarvari via lists.openembedded.org wrote:
> > Hello,
> > 
> > tl;dr: This is a proof of concept about using an LLM to enrich CVE feeds.
> > Links are somewhere down below to see the difference. If you think it's
> > useful, please say so. If you think it's not useful, please say so too.
> > 
> > Motivation: the CVE checker associates CVEs with recipes based on the
> > CPE information in the CVE entry. Unfortunately there are quite a few
> > CVE entries missing this information entirely, making it impossible to
> > associate them with any recipes. Looking at this year so far there are
> > over 66000 CVE's opened, of which over 15000 are missing CPEs. Though
> > older entries seem to have better CPE-to-CVE ratio, but for this PoC
> > I'm mostly interested in the latest vulnerabilities.
> > 
> > The idea: in case CPE information is missing, try to derive it from
> > the human language description and the reference links of the CVE,
> > using an LLM. The intuition would be that a good portion of the derived
> > data would be usable, and even though it wouldn't be perfect, it would
> > catch more valid CVEs than without it.
> > 
> [...]
> > This patch is just a proof of concept.
> > I'm not sure if/how it could be integrated in the project's
> > infra - especially the initial load is very heavy, and the patch requires 
> > GPU(s).
> > 
> 
> I've no interest in the technical implementation of this, just 
> commenting on the reason for this to exist in the first place.
> 
> This all comes from "CVEs without CPE exist". This is not a Yocto 
> problem, and it doesn't seem right for Yocto to be the one fixing it. 
> Someone should fix the database everyone is using. This suggested 
> approach may be a way to fix the current content of the database, though 
> I don't know if *we* want this in Yocto for helping and maintain it to 
> some degree, or if living outside of it in a fork or something is better 
> suited.
> 
> I don't know what will be the stance of NIST, MITRE or whatever entity 
> is responsible for the database(s) wrt using LLMs to identify CPE for 
> CVEs. I don't understand how they can even accept CVEs without a CPE: 
> "here's a vuln, figure out which piece of software in the world this 
> applies to" is madness to me.
> 
> I'm Cc'ing the security folks as that would probably be the people with 
> the most to say about this and with some idea on how to fix processes or 
> bring up issues with the impacted entities.


To give another idea here: When I was looking at bug #15932 [1] where we
wanted the CPE to be added for CVE-2025-26519, the CPE information was
available via the MITRE API [2] but not in the NVD entry [3].

So, before we consult a stochastic parrot, we may want to automate the
process of collecting data from other available sources. We may want to
cache that additional data within our metadata layers as well as sending
it to NIST to get the NVD updated.

Over time we need to move away from relying on the NVD as a single point
of failure.

[1]: https://bugzilla.yoctoproject.org/show_bug.cgi?id=15932
[2]: https://cveawg.mitre.org/api/cve/CVE-2025-26519
[3]: https://nvd.nist.gov/vuln/detail/CVE-2025-26519

Best regards,

-- 
Paul Barker

signature.asc
Description: This is a digitally signed message part

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#227599): 
https://lists.openembedded.org/g/openembedded-core/message/227599
Mute This Topic: https://lists.openembedded.org/mt/116745075/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [OE-core] [PoC 0/1] LLM enriched CVE entries

Reply via email to