mbaedke commented on code in PR #2923: URL: https://github.com/apache/jackrabbit-oak/pull/2923#discussion_r3341190828
########## draft-THREAT-MODEL.md: ########## @@ -0,0 +1,1060 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Apache Jackrabbit Oak Security Threat Model (draft) + +**Why a separate Jackrabbit-Oak model (not a single Jackrabbit-PMC umbrella).** +The Jackrabbit PMC owns three functionally distinct codebases that share a +common JCR API contract but have completely different security architectures: +the original `jackrabbit` (jackrabbit-core, JR2-era), `jackrabbit-oak` (the +modern, scalable successor with a different storage model and a redesigned +security stack), and `jackrabbit-filevault` (a packaging / serialisation +tool whose entire reason for existing is to move repository content across a +trust boundary as a zip file). An umbrella model would have to disclaim each +of the per-repo nuances in turn — every "the project trusts X" statement +would carry "...for Oak, but jackrabbit-core uses a different mechanism, and +filevault doesn't have callers in this sense". Three smaller models cite +each other for the JCR contract and stand on their own for everything else. +The triage utility of a closed-set §13 disposition table requires that +each project's set actually be closed. + +## §1 Header + +- **Project:** Apache Jackrabbit Oak (`apache/jackrabbit-oak`) *(documented: + `AGENTS.md`, `.asf.yaml`)*. Oak is the modern JCR repository + implementation; the original `jackrabbit` (jackrabbit-core) is modelled + separately, and `jackrabbit-filevault` (packaging) is also modelled + separately. +- **Commit / version binding:** drafted against the default branch + (`trunk`) *(documented: `.asf.yaml` — `protected_branches: trunk`)*. A + vulnerability report against Oak version *N* should be triaged against + the model as it stood at *N* (release tag), not against `trunk`. +- **Date:** 2026-05-30. +- **Authors:** ASF Security team draft, awaiting Jackrabbit PMC review. +- **Status:** draft — under maintainer review. +- **Reporting cross-reference:** findings that may violate a §8 property + should be reported per the ASF Security Team disclosure channel + (`[email protected]`) and the Jackrabbit project's security mailing + list, before public disclosure *(documented: Oak's `security/reports.md` + — "The Apache Security team requests that researchers report + undisclosed vulnerabilities to the security mailing list before public + disclosure")*. Findings that fall under §3, §9, or §11a will be + closed by Oak triagers citing this document. +- **Provenance legend:** + *(documented)* — drawn from in-repo docs or website docs with citation; + *(maintainer)* — confirmed by an Oak maintainer in response to this + draft; *(inferred)* — synthesised from code structure or domain + knowledge, awaiting PMC ratification (every *(inferred)* tag has a + matching §14 question). +- **Draft confidence:** 35 documented / 0 maintainer / 28 inferred. + +**About the project.** Apache Jackrabbit Oak is the actively-developed +scalable hierarchical content repository that succeeded the original +Apache Jackrabbit (jackrabbit-core). It implements the JCR 2.0 +specification (JSR 283) and is the storage engine that ships with Adobe +Experience Manager and several other CMS / DAM products. The +implementation is split across ~47 Maven modules under +`oak-*`; it is intended to be **embedded** by a host application (CMS, +asset manager, integration framework, …) — not deployed as a standalone +server — and exposes a JCR `Repository` API and an Oak `ContentRepository` +API. Security is structured around a pluggable `SecurityProvider` that +binds an `AuthenticationConfiguration`, an `AuthorizationConfiguration` +(possibly composite), a `UserConfiguration`, a `PrincipalConfiguration`, +and a `PrivilegeConfiguration` *(documented: `security/introduction.md`, +`security/overview.md`)*. + +## §2 Scope and intended use + +### Intended use + +- **In-process JCR 2.0 (JSR 283) repository** embedded by a host + application (CMS, DAM, integration framework). The Oak repository + exposes `javax.jcr.Repository` and `org.apache.jackrabbit.oak.api.ContentRepository` + to in-process callers; there is no built-in network listener and no + per-end-user authn/authz outside what the host plugs into the + `LoginContextProvider` and `AuthorizationConfiguration` chain + *(documented: `security/introduction.md`, + `security/authentication.md`)*. +- The repository supports multiple NodeStore backends — DocumentNodeStore + (Mongo, RDB), SegmentNodeStore (TarMK), Composite, with Blob storage + in BlobStore (S3, Azure, FileBlobStore, …) *(documented: `AGENTS.md` — + "Persistence: Multiple NodeStore backends (Document, Segment/TarMK, + Composite, AWS S3, Azure)")*. +- Indexing is via Lucene (oak-lucene) and Elasticsearch (oak-search-elastic) + *(documented: `AGENTS.md`)*. + +### Deployment shape + +Oak is **not** a standalone daemon and is **not** a network service in its +own right. It is an in-process library. Network exposure (HTTP, WebDAV, +custom protocol) is **always** an artefact of the host application; Oak +ships no listener of its own *(inferred — §14 Q1)*. The threat model +is therefore that of a library, not a service — but a library whose +contract specifically promises authentication and authorisation +properties to its host, which makes it a more security-load-bearing +library than (say) zlib. + +### Caller roles + +Following §2 of the output-structure rubric (in-process-library split): + +| Role | Trust level | Notes | +| --- | --- | --- | +| **Host application code** | trusted | Holds the `Repository`/`ContentRepository` handle; chooses the `SecurityProvider` and `LoginContextProvider`; configures the NodeStore + BlobStore; may bypass authorisation by obtaining a system-level session via `loginAdministrative` / `loginService` *(documented: `security/permission/default.md` — admin/system principals bypass permission evaluation)*. The host decides whether end-user credentials reach Oak at all. | +| **JCR session principal (end user)** | untrusted but authenticated | Identifies through `Repository.login(Credentials)` or `ContentRepository.login(Credentials, workspaceName)`; subjected to the configured `PermissionProvider` chain on every read/write. The principal is authenticated by Oak's `LoginContext` *(documented: `security/authentication.md`)*. | +| **System / admin principal** | trusted | A session obtained via `loginAdministrative` / `loginService` (host-driven) carries `SystemPrincipal` or `AdminPrincipal` and bypasses permission evaluation *(documented: `security/permission/default.md` — "Three principal categories automatically receive full repository access: SystemPrincipal, AdminPrincipal, and Principals matching configured administrative names")*. | +| **External identity provider** | trusted control plane | The host configures one or more `ExternalIdentityProvider`s for LDAP / SAML / OAuth; Oak's `ExternalLoginModule` accepts whatever identity these IDPs assert *(documented: `security/authentication/externalloginmodule.md` — "The mechanism implicitly trusts that the configured IDP accurately authenticates identities")*. | +| **Pre-authenticated caller** | trusted (operator-asserted) | When `PreAuthenticatedLogin` is in use, Oak performs no credential verification at all; the host is asserting "this user has already been authenticated upstream" *(documented: `security/authentication/preauthentication.md` — "Oak delegates all authentication responsibility to the caller")*. | +| **NodeStore backend** | trusted | Mongo / Tar / Segment / RDB / Composite storage is assumed honest and assumed to enforce its own at-rest protections *(inferred — §14 Q2)*. | +| **BlobStore backend** | trusted | S3 / Azure / FileBlobStore is assumed honest *(inferred — §14 Q2)*. | + +### Component-family table + +| Family | Representative entry | Touches outside the process? | In-model? | +| --- | --- | --- | --- | +| `oak-api`, `oak-core`, `oak-core-spi` — content tree, MVCC, commit hooks | `ContentRepository.login` | no (only through a NodeStore) | **yes** | +| `oak-jcr` — JCR 2.0 binding *(documented: `AGENTS.md`)* | `Repository.login` | no | **yes** | +| `oak-security-spi`, default `AuthorizationConfiguration`, default `PermissionProvider` | `SecurityProvider` | no | **yes** (high security weight; 100% test coverage mandate per `AGENTS.md`) | +| `oak-authorization-cug` — Closed User Groups *(documented: `security/authorization/cug.md`)* | composite `AuthorizationConfiguration` | no | **yes** (read-only authorisation only — see §9.10) | +| `oak-authorization-principalbased` — principal-based authz *(documented: `AGENTS.md`)* | composite `AuthorizationConfiguration` | no | **yes** | +| `oak-auth-external` — IDP framework *(documented: `security/authentication/externalloginmodule.md`)* | `ExternalIdentityProvider` SPI | depends on IDP impl | **yes** for the wrapper; IDP impl is per-host | +| `oak-auth-ldap` — LDAP IDP *(documented: `AGENTS.md`)* | `LdapIdentityProvider` | **yes — LDAP/AD** | **yes** | +| Persistence — `oak-store-document` (Mongo / RDB), `oak-segment-tar`, `oak-store-composite`, `oak-store-spi` | `DocumentNodeStore`, `SegmentNodeStore` | **yes — DB / FS** | **yes** for in-Oak code; backend itself is trusted (§3) | +| BlobStore — `oak-blob`, `oak-blob-cloud`, `oak-blob-cloud-azure`, `oak-blob-plugins` | `S3DataStore`, `AzureDataStore` | **yes — S3 / Azure / FS** | **yes** for in-Oak code; cloud APIs trusted (§3) | +| Search — `oak-lucene`, `oak-search`, `oak-search-elastic` | `IndexEditor`, query parsers | sometimes (Elasticsearch over HTTP) | **yes** | +| `oak-run` — operator CLI / tooling | `oak-run.jar` | OS / FS / network depending on subcommand | **see §3** (in-model only for the command-driven contract; out-of-model for "operator runs it as the wrong user") | +| `oak-pojosr`, `oak-standalone` — repository launchers | embedded repository | filesystem | **yes** for code; deployment is operator's | +| `oak-upgrade` — JR2 → Oak migration | offline migration job | filesystem | **yes** for code; the migration source is a trusted JR2 repository | +| `oak-it`, `oak-it-osgi`, `oak-bench-*`, `oak-jcr-tests`, `oak-test-bundle`, `oak-exercise` | integration tests, benchmarks, training | varies | **out of model** — unsupported components *(§3)* | +| `oak-examples`, `oak-doc-railroad-macro`, `oak-doc` | examples and docs | none | **out of model** *(§3)* | +| Archived MicroKernel modules (`oak-mk-*`) | n/a | n/a | **out of model** — explicitly archived *(documented: README — "MicroKernel-related modules have been archived")* | + +A finding is in-model only if it lands in a row marked **yes**. See §4 +for per-component reachability tests. + +## §3 Out of scope (explicit non-goals) + +Reports requiring any of these will be closed with the cited disposition: + +1. **Host application correctness.** Oak is embedded. If the host hands + out an admin session to an unauthenticated HTTP request, exposes + `loginAdministrative` over JMX, or routes user-supplied SQL2 queries + directly into the session without filtering, the harm is the host's + *(documented: `security/permission/default.md` — admin/system + principals bypass permission evaluation)*. → `OUT-OF-MODEL: + adversary-not-in-scope`. Review Comment: I think that this needs to be refined, since it might exclude XML, SQL2 etc. parsing issues like XML external entity injection from being considered a threat. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
