In short (summarising to those who do not want to read my long mail) as I see this working for us long term:
* we need to have a way to say VEX are not maintained for old versions * we will publish SBOMs for "main" development so that our users can automatically see if problems are fixed in the upcoming release * for all CVES we will have "in triage" state - we can automate that * we will only publish "known" state for those dependencies that we actually analysed and know the state(there will be just few of those) * we should be able to say "no idea, help us to find out if this is `affected` state but you will never hear "not affected" from us" If we can do that, then yes - that's workable at scale, but I am afraid it's not the expectation of those who want those VEXs :) J. On Thu, Feb 6, 2025 at 1:15 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > I agree that we should only publish VEXes for the currently supported > version (i.e. usually the most recent one). > > Do you know if there is a way to "unpublish" VEX's ? (mark them not > maintained). > Because that is what will effectively happen, we will have to start > publishing new VEX's for the latest version and make sure that all the > consumers of previous VEX know it's not maintained any more. > > Since looking through somebody else's code is error-prone and time >> consuming, I would restrict the generation of VEX statements to the >> following two situation: >> >> 1. The CVE is in a dependency that is only used by your project, not >> your project's dependencies. In this case you can directly read the CVE >> (or VDR document) and identify if you use the affected feature in your >> code. >> > > As long as they are published in a way that we can discover them > automatically - which is not the case currently, but hopefully transparency > exchange API will make it possible, I guess also there will be a long tail > of projects that do not have VEX's or VDRS or SBOMS for years to come, so > the question is what we do with those dependencies? But I am afraid we are > a few years from even a small percentage of our deps to support it. And if > our users are regulators, ask us - what do we say? We need to have a good > answer prepared. > > >> 2. The CVE is in a transitive dependency. In this case you wait until >> all your **direct** dependencies potentially affected publish a VEX >> statement. Only then you publish a statement based on what your >> dependencies say in their VEX records. >> > > Same - we need some discoverability and a way to say "no idea" if an > automated way of discovering those does not work or the dependency does not > publish those. > > >> >> Note the the utility of VEXes varies greatly between ecosystems. In the >> case of Python applications users download the application and its >> dependencies directly from PyPI. Unless I am mistaken, if there is a >> vulnerability in one of your dependencies, you don't need to make a new >> AirFlow release, users can easily upgrade their system themselves. >> VEX-es might be only useful to those that create an Airflow Docker image >> to know if they need to regenerate it. They will probably do it anyway. >> > > It's not that simple - It very much depends. All variations are possible > and that's what makes it complex because you need to sometimes perform > deeper analysis, Sometimes your application or library introduces > upper-binding that effectively makes it impossible to upgrade. Sometimes > this is implied by upper-binding or other limitations coming from our > dependencies or dependencies of our dependencies. > > Good example is a Werkzeug dependency (that we did such analysis on): > > * Airflow 2 uses Werkzeug - transitively through several of our > dependencies > * For quite a while there is a known and serious vulnerability in Werkzeug > (https://github.com/apache/airflow/discussions/44865 - CVE-2024-49767) in > version 2.2.3 we have - fixed in version 3.0.0 . We believe it does not > affect us as we do not seem to use that feature that affects us (uploading > files) - but we are not sure, because this dependency is used by about 5 or > 6 other dependencies of ours - not directly by us - and we have not done > complete analysis of those - that would require a lot of digging and > analysing 3rd-party code to that we have no expertise in. So our answer is > "we think we are not affected, but we are not really sure - if you find an > exploitable path, let us know, please by providing a reproducible report". > * Up until ~ last year several of our dependencies did not allow us to > upgrade - they were upper-bound. This has improved over time and many of > them released versions that have no such limitation > * But the one remaining - Connexion 2 only updated to the new Werkzeug 3 > in Connexion 3 - and they heavily changed the architecture of Connexion 3 - > so that we cannot upgrade easily. Actually - we cannot upgrade at all. We > spend (particularly me) weeks of effort to do that, 2 interns from Major > League Hacking trying to solve the problems and make upgrade happen - and > we even had a green PR after 4 months - that no-one in the right mind would > accept, because it was band-aid-over-band-aid-over-band-aiid. We worked > together with Connexion maintainers and they helped us, but they would not > release Connexion 2 with Werkzeug 3 support as they stopped maintaining it > for good (actually they are new maintainers who took over from old ones, > and they only exclusively worked on rewriting Connexion 3 to use ASGI > instead of WSGI and they know and care almost nothing about Connexion 2. > The old maintainers are gone with the wind (of change). > * We are dropping Connexion in Airflow 3 - in favour of Fast API- part of > the reason is the security issue. That requires several months of effort > from about.3 people working on it pretty much full time. > > Our users (happened multiple times) DEMAND from us to remove that > dependency, because it is shown in their scans as critical. And it's not > the image. The Werkzeug dependency is installed every time Airflow is > installed - version 2.3.0 that is vulnerable and we cannot do anything > about it - until Airflow 3. > > So - in short, even being able to answer and triage this issue and see > what we can tell the users was literally months of effort, POC that failed, > and the end result is still "won't fix, we are likely not affected". So > essentially it's none of that: "resolved, resolved_with_pedigree, > exploitable, in_triage, false_positive, not_affected". It's something of > "we really don't know, but what we know is that we will not fix it in > Airflow 2". If we knew of an exploitable way, we could likely monkey-patch > it, so if someone reports it to us that would be the remediation, but we > have no knowledge about any exploit. > > This is but one example. And of course it's not an often case - but you > can see that we have a chain of dependencies and pretty complex > interactions between them and deep analysis and literally weeks of effort > might be needed to just be able to say "anything" about such an issue. > > This is ONE example. only. We have 180 direct and 700 indirect > dependencies. Also there are other variations. Some of our dependencies > have limits to versions of Python and platforms they are supported on. > (this is pretty common actually) - which means that in some cases our users > can upgrade dependency on Python 3.10, but not on Python 3.8. And this > might be only known after deep analysis of the chain of dependencies, > because this might come from transitive ways how dependencies are resolved. > > What I am trying to say is that expecting us to say "not affected" or any > other "firm" statement often requires a lot of effort to just find the > right answer. > > > * If something like CVE-2022-1471[2] happens, the Jackson team will >> create a "not_affected" VEX record and Log4j Core will pass the same >> result to Kafka. Note that this will not generate a lot of additional >> work for the Jackson team, since they provided a human-readable version >> anyway[3]. It could even generate less work, if users consume the VEX >> file. >> > > >> * If something like CVE-2022-38752[4] happens, the Jackson team will >> create an upgrade recommendation for those that parse untrusted YAML. >> Log4j Core will generate a "not_affected" VEX record (we only use YAML >> to parse a config file) and yet again Kafka does not need a new release. >> Again, Jackson had to deal with a user question about it[5], so a VEX >> statement will not generate a lot of additional work. >> >> > Yes, If we can automatically discover vex's of all our deps, that use > transitive dep, and all of them say "not affected", fine. But this will not > happen in such cases. Even if Connexion 3 says "Affected" and we do not use > that particular functionality, we are not automatically affected as well - > because we might use only a subset of the functionality of the transient > dependency ourselves. In the example above. Connexion 2 would be "affected" > and following your logic, we would have to also say "affected", even if we > believe we are not (but have no proof of that -- this is next to impossible > to have proof of not being affected in this case). > > That means that Airflow 2 would have "affected" Critical Werkzeug > vulnerability - even if we currently believe we are not - and nobody would > be able to install Airflow if they want to be "CRA compliant". Airflow 3 > will be out in 3 months. So everyone would have to wait for it. We have > literally no other actions we can do currently. We could take the risk of > course and say "not affected". But we really do not know. So what it > basically means it will be perpetually "in triage" for us. I see no other > way. > > And yes that's one example only, but It's all but guaranteed it will > continue to happen as more CVEs will be discovered - especially for old > versions of Airflow. I really hope we can only limit ourselves to publish > only "last version" VEXes - that would help us immensely to at least > attempt to realistically assess those. > > > >> Summarizing: let us restrict VEX-es to the current version of our >> projects and publish them only when the CVE has been analyzed by all our >> "suppliers". We should be open to external companies helping us with the >> analysis through PRs, but let us not do any additional work ourselves. >> Of course there are exceptions: when Log5Shell happens, we can do the >> entire analysis ourselves. >> > > Yep. I would love to see the users who have Airflow 2.7 to ask for VEX for > it only to get 2.10.5 one (being voted now) and getting the answers > "upgrade to latest version if you want to have current VEX". And I really > love the idea - because that's what we are telling them now. So If we > really can use VEX's to stimulate that and have a way to say "Those VEXes > we published for all past Airflow versions are not maintained any more and > we don't care any more for that - upgrade". That's my wet dream as a > maintainer. But I am afraid we have to work with the regulators and our > users to make sure that they have the same expectations - but I seriously > doubt it to be honest. I hope I am wrong :) > > >> >>