reschke commented on code in PR #2923:
URL: https://github.com/apache/jackrabbit-oak/pull/2923#discussion_r3341649305


##########
draft-THREAT-MODEL.md:
##########
@@ -0,0 +1,1060 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Apache Jackrabbit Oak Security Threat Model (draft)
+
+**Why a separate Jackrabbit-Oak model (not a single Jackrabbit-PMC umbrella).**
+The Jackrabbit PMC owns three functionally distinct codebases that share a
+common JCR API contract but have completely different security architectures:
+the original `jackrabbit` (jackrabbit-core, JR2-era), `jackrabbit-oak` (the
+modern, scalable successor with a different storage model and a redesigned
+security stack), and `jackrabbit-filevault` (a packaging / serialisation
+tool whose entire reason for existing is to move repository content across a
+trust boundary as a zip file). An umbrella model would have to disclaim each
+of the per-repo nuances in turn — every "the project trusts X" statement
+would carry "...for Oak, but jackrabbit-core uses a different mechanism, and
+filevault doesn't have callers in this sense". Three smaller models cite
+each other for the JCR contract and stand on their own for everything else.
+The triage utility of a closed-set §13 disposition table requires that
+each project's set actually be closed.
+
+## §1 Header
+
+- **Project:** Apache Jackrabbit Oak (`apache/jackrabbit-oak`) *(documented:
+  `AGENTS.md`, `.asf.yaml`)*. Oak is the modern JCR repository
+  implementation; the original `jackrabbit` (jackrabbit-core) is modelled
+  separately, and `jackrabbit-filevault` (packaging) is also modelled
+  separately.
+- **Commit / version binding:** drafted against the default branch
+  (`trunk`) *(documented: `.asf.yaml` — `protected_branches: trunk`)*. A
+  vulnerability report against Oak version *N* should be triaged against
+  the model as it stood at *N* (release tag), not against `trunk`.
+- **Date:** 2026-05-30.
+- **Authors:** ASF Security team draft, awaiting Jackrabbit PMC review.
+- **Status:** draft — under maintainer review.
+- **Reporting cross-reference:** findings that may violate a §8 property
+  should be reported per the ASF Security Team disclosure channel
+  (`[email protected]`) and the Jackrabbit project's security mailing
+  list, before public disclosure *(documented: Oak's `security/reports.md`
+  — "The Apache Security team requests that researchers report
+  undisclosed vulnerabilities to the security mailing list before public
+  disclosure")*. Findings that fall under §3, §9, or §11a will be
+  closed by Oak triagers citing this document.
+- **Provenance legend:**
+  *(documented)* — drawn from in-repo docs or website docs with citation;
+  *(maintainer)* — confirmed by an Oak maintainer in response to this
+  draft; *(inferred)* — synthesised from code structure or domain
+  knowledge, awaiting PMC ratification (every *(inferred)* tag has a
+  matching §14 question).
+- **Draft confidence:** 35 documented / 0 maintainer / 28 inferred.
+
+**About the project.** Apache Jackrabbit Oak is the actively-developed
+scalable hierarchical content repository that succeeded the original
+Apache Jackrabbit (jackrabbit-core). It implements the JCR 2.0
+specification (JSR 283) and is the storage engine that ships with Adobe
+Experience Manager and several other CMS / DAM products. The
+implementation is split across ~47 Maven modules under
+`oak-*`; it is intended to be **embedded** by a host application (CMS,
+asset manager, integration framework, …) — not deployed as a standalone
+server — and exposes a JCR `Repository` API and an Oak `ContentRepository`
+API. Security is structured around a pluggable `SecurityProvider` that
+binds an `AuthenticationConfiguration`, an `AuthorizationConfiguration`
+(possibly composite), a `UserConfiguration`, a `PrincipalConfiguration`,
+and a `PrivilegeConfiguration` *(documented: `security/introduction.md`,
+`security/overview.md`)*.
+
+## §2 Scope and intended use
+
+### Intended use
+
+- **In-process JCR 2.0 (JSR 283) repository** embedded by a host
+  application (CMS, DAM, integration framework). The Oak repository
+  exposes `javax.jcr.Repository` and 
`org.apache.jackrabbit.oak.api.ContentRepository`
+  to in-process callers; there is no built-in network listener and no
+  per-end-user authn/authz outside what the host plugs into the
+  `LoginContextProvider` and `AuthorizationConfiguration` chain
+  *(documented: `security/introduction.md`,
+  `security/authentication.md`)*.
+- The repository supports multiple NodeStore backends — DocumentNodeStore
+  (Mongo, RDB), SegmentNodeStore (TarMK), Composite, with Blob storage
+  in BlobStore (S3, Azure, FileBlobStore, …) *(documented: `AGENTS.md` —
+  "Persistence: Multiple NodeStore backends (Document, Segment/TarMK,
+  Composite, AWS S3, Azure)")*.
+- Indexing is via Lucene (oak-lucene) and Elasticsearch (oak-search-elastic)
+  *(documented: `AGENTS.md`)*.
+
+### Deployment shape
+
+Oak is **not** a standalone daemon and is **not** a network service in its
+own right. It is an in-process library. Network exposure (HTTP, WebDAV,
+custom protocol) is **always** an artefact of the host application; Oak
+ships no listener of its own *(inferred — §14 Q1)*. The threat model
+is therefore that of a library, not a service — but a library whose
+contract specifically promises authentication and authorisation
+properties to its host, which makes it a more security-load-bearing
+library than (say) zlib.
+
+### Caller roles
+
+Following §2 of the output-structure rubric (in-process-library split):
+
+| Role | Trust level | Notes |
+| --- | --- | --- |
+| **Host application code** | trusted | Holds the 
`Repository`/`ContentRepository` handle; chooses the `SecurityProvider` and 
`LoginContextProvider`; configures the NodeStore + BlobStore; may bypass 
authorisation by obtaining a system-level session via `loginAdministrative` / 
`loginService` *(documented: `security/permission/default.md` — admin/system 
principals bypass permission evaluation)*. The host decides whether end-user 
credentials reach Oak at all. |
+| **JCR session principal (end user)** | untrusted but authenticated | 
Identifies through `Repository.login(Credentials)` or 
`ContentRepository.login(Credentials, workspaceName)`; subjected to the 
configured `PermissionProvider` chain on every read/write. The principal is 
authenticated by Oak's `LoginContext` *(documented: 
`security/authentication.md`)*. |
+| **System / admin principal** | trusted | A session obtained via 
`loginAdministrative` / `loginService` (host-driven) carries `SystemPrincipal` 
or `AdminPrincipal` and bypasses permission evaluation *(documented: 
`security/permission/default.md` — "Three principal categories automatically 
receive full repository access: SystemPrincipal, AdminPrincipal, and Principals 
matching configured administrative names")*. |
+| **External identity provider** | trusted control plane | The host configures 
one or more `ExternalIdentityProvider`s for LDAP / SAML / OAuth; Oak's 
`ExternalLoginModule` accepts whatever identity these IDPs assert *(documented: 
`security/authentication/externalloginmodule.md` — "The mechanism implicitly 
trusts that the configured IDP accurately authenticates identities")*. |
+| **Pre-authenticated caller** | trusted (operator-asserted) | When 
`PreAuthenticatedLogin` is in use, Oak performs no credential verification at 
all; the host is asserting "this user has already been authenticated upstream" 
*(documented: `security/authentication/preauthentication.md` — "Oak delegates 
all authentication responsibility to the caller")*. |
+| **NodeStore backend** | trusted | Mongo / Tar / Segment / RDB / Composite 
storage is assumed honest and assumed to enforce its own at-rest protections 
*(inferred — §14 Q2)*. |
+| **BlobStore backend** | trusted | S3 / Azure / FileBlobStore is assumed 
honest *(inferred — §14 Q2)*. |
+
+### Component-family table
+
+| Family | Representative entry | Touches outside the process? | In-model? |
+| --- | --- | --- | --- |
+| `oak-api`, `oak-core`, `oak-core-spi` — content tree, MVCC, commit hooks | 
`ContentRepository.login` | no (only through a NodeStore) | **yes** |
+| `oak-jcr` — JCR 2.0 binding *(documented: `AGENTS.md`)* | `Repository.login` 
| no | **yes** |
+| `oak-security-spi`, default `AuthorizationConfiguration`, default 
`PermissionProvider` | `SecurityProvider` | no | **yes** (high security weight; 
100% test coverage mandate per `AGENTS.md`) |
+| `oak-authorization-cug` — Closed User Groups *(documented: 
`security/authorization/cug.md`)* | composite `AuthorizationConfiguration` | no 
| **yes** (read-only authorisation only — see §9.10) |
+| `oak-authorization-principalbased` — principal-based authz *(documented: 
`AGENTS.md`)* | composite `AuthorizationConfiguration` | no | **yes** |
+| `oak-auth-external` — IDP framework *(documented: 
`security/authentication/externalloginmodule.md`)* | `ExternalIdentityProvider` 
SPI | depends on IDP impl | **yes** for the wrapper; IDP impl is per-host |
+| `oak-auth-ldap` — LDAP IDP *(documented: `AGENTS.md`)* | 
`LdapIdentityProvider` | **yes — LDAP/AD** | **yes** |
+| Persistence — `oak-store-document` (Mongo / RDB), `oak-segment-tar`, 
`oak-store-composite`, `oak-store-spi` | `DocumentNodeStore`, 
`SegmentNodeStore` | **yes — DB / FS** | **yes** for in-Oak code; backend 
itself is trusted (§3) |
+| BlobStore — `oak-blob`, `oak-blob-cloud`, `oak-blob-cloud-azure`, 
`oak-blob-plugins` | `S3DataStore`, `AzureDataStore` | **yes — S3 / Azure / 
FS** | **yes** for in-Oak code; cloud APIs trusted (§3) |
+| Search — `oak-lucene`, `oak-search`, `oak-search-elastic` | `IndexEditor`, 
query parsers | sometimes (Elasticsearch over HTTP) | **yes** |
+| `oak-run` — operator CLI / tooling | `oak-run.jar` | OS / FS / network 
depending on subcommand | **see §3** (in-model only for the command-driven 
contract; out-of-model for "operator runs it as the wrong user") |
+| `oak-pojosr`, `oak-standalone` — repository launchers | embedded repository 
| filesystem | **yes** for code; deployment is operator's |
+| `oak-upgrade` — JR2 → Oak migration | offline migration job | filesystem | 
**yes** for code; the migration source is a trusted JR2 repository |
+| `oak-it`, `oak-it-osgi`, `oak-bench-*`, `oak-jcr-tests`, `oak-test-bundle`, 
`oak-exercise` | integration tests, benchmarks, training | varies | **out of 
model** — unsupported components *(§3)* |
+| `oak-examples`, `oak-doc-railroad-macro`, `oak-doc` | examples and docs | 
none | **out of model** *(§3)* |
+| Archived MicroKernel modules (`oak-mk-*`) | n/a | n/a | **out of model** — 
explicitly archived *(documented: README — "MicroKernel-related modules have 
been archived")* |
+
+A finding is in-model only if it lands in a row marked **yes**. See §4
+for per-component reachability tests.
+
+## §3 Out of scope (explicit non-goals)
+
+Reports requiring any of these will be closed with the cited disposition:
+
+1. **Host application correctness.** Oak is embedded. If the host hands
+   out an admin session to an unauthenticated HTTP request, exposes
+   `loginAdministrative` over JMX, or routes user-supplied SQL2 queries
+   directly into the session without filtering, the harm is the host's
+   *(documented: `security/permission/default.md` — admin/system
+   principals bypass permission evaluation)*. → `OUT-OF-MODEL:
+   adversary-not-in-scope`.
+2. **NodeStore / BlobStore / IDP correctness.** Mongo, RDB, TarMK on
+   disk, S3, Azure, LDAP, SAML — Oak trusts the responses these systems
+   give. A backend returning forged bytes, an LDAP server asserting a
+   spoofed group membership, an S3 bucket allowing unauthorised reads —
+   none are Oak vulnerabilities *(inferred — §14 Q2)*. →
+   `OUT-OF-MODEL: trusted-input`.
+3. **Storage-level authorisation.** HDFS / S3 / filesystem ACLs on the
+   underlying NodeStore / BlobStore are the operator's responsibility. A
+   tar-store file readable by `other` is not an Oak bug *(inferred —
+   §14 Q3)*. → `OUT-OF-MODEL: adversary-not-in-scope`.
+4. **Pre-authentication misuse.** The `PreAuthenticatedLogin` mechanism
+   is an *explicit* bypass: Oak does no credential verification at all
+   and trusts that an upstream layer has *(documented:
+   `security/authentication/preauthentication.md`)*. A report that the
+   pre-auth code path "trusts the caller" is a documented design
+   choice. → `BY-DESIGN: property-disclaimed` (§9).
+5. **Custom `SecurityProvider` / `LoginModule` replacements.** Oak
+   ships a default but documents that custom implementations are "only
+   recommended for experts having in-depth understanding of Oak
+   internals and which understand the security risk associated with
+   custom replacements" *(documented: `security/introduction.md`)*. A
+   report that requires a custom SPI implementation that voids a
+   guarantee is the host's choice. → `OUT-OF-MODEL: non-default-build`.
+6. **`oak-run` invoked by the operator.** `oak-run` is an offline
+   administrative CLI; running it requires direct filesystem and
+   credential access. "Operator runs `oak-run console` and dumps the
+   repository" is not a vulnerability *(inferred — §14 Q4)*. →
+   `OUT-OF-MODEL: adversary-not-in-scope`.
+7. **Code that ships but is not part of the supported product:**
+   `oak-it/`, `oak-it-osgi/`, `oak-bench-*/`, `oak-jcr-tests/`,
+   `oak-test-bundle/`, `oak-exercise/`, `oak-examples/`,
+   `oak-doc-railroad-macro/`, archived `oak-mk-*` modules
+   *(documented: README, AGENTS.md)*. → `OUT-OF-MODEL:
+   unsupported-component`.
+8. **Original `jackrabbit` / `jackrabbit-core` code.** Oak migrated away
+   from the JR2 codebase; jackrabbit-core has a separate threat model.
+   The `oak-upgrade` module imports from jackrabbit-core as a one-shot
+   migration source; a bug in the JR2-side code is jackrabbit-core's
+   threat-model problem. → `OUT-OF-MODEL: unsupported-component` (with
+   cross-reference).
+9. **`jackrabbit-filevault` package import** — filevault has its own
+   threat model. A vulnerable filevault install hook is filevault's
+   problem; Oak's role is to honour the JCR session privileges that
+   filevault uses *(inferred — §14 Q5)*. → `OUT-OF-MODEL:
+   unsupported-component` (with cross-reference).
+10. **Build / release / SDLC hygiene.** Action pinning, signing,
+    reproducible builds, branch protection — out of model per the SKILL.
+11. **Side channels** (cache timing, branch prediction, co-tenant on the
+    same JVM). *(inferred — §14 Q6)* → `OUT-OF-MODEL:
+    adversary-not-in-scope`.
+
+## §4 Trust boundaries and data flow
+
+Oak's trust boundary is **the JCR `Session` / Oak `ContentSession` API
+surface**. Once a `Session` has been obtained, every read/write goes
+through the configured `PermissionProvider` chain — *unless* the
+principal is `SystemPrincipal`, `AdminPrincipal`, or matches a
+configured administrative name, in which case permission evaluation is
+bypassed *(documented: `security/permission/default.md` — "Three
+principal categories automatically receive full repository access ...
+Administrator sessions bypass permission evaluation entirely")*.
+
+There are six trust transitions a finding must land in to be in-model:
+
+| # | Transition | Who authenticates | Who authorises |
+| --- | --- | --- | --- |
+| B1 | Host → `Repository.login(Credentials)` / 
`ContentRepository.login(Credentials, ws)` | `LoginContext` chain 
(`LoginModule` impls in the configured JAAS appname; default is 
`LoginModuleImpl` + optional `TokenLoginModule` + optional 
`ExternalLoginModule`) *(documented: `security/authentication.md`)* | n/a at 
this transition |
+| B2 | Host → `loginAdministrative` / `loginService` | trusted: the host *is* 
the system *(documented: `security/permission/default.md`)* | n/a — these 
sessions bypass authorisation |
+| B3 | JCR session principal → tree read | `Subject` already established at B1 
| `PermissionProvider` (default + optional CUG + optional principalbased, 
composed) *(documented: `security/authorization.md`, `security/permission.md`)* 
|
+| B4 | JCR session principal → tree write | same as B3 | `PermissionValidator` 
commit hook *(documented: `security/permission/evaluation.md`)* |
+| B5 | `ExternalLoginModule` → external IDP (LDAP, SAML, …) | the IDP 
authenticates the user *(documented: 
`security/authentication/externalloginmodule.md`)* | per-IDP; Oak honours 
whatever group membership the IDP asserts on sync |
+| B6 | Oak → NodeStore / BlobStore (Mongo, Tar, RDB, S3, Azure) | backend's 
own auth (operator-configured) | backend's own ACLs |
+
+### Reachability preconditions per family
+
+A finding is in-model only if it meets the family's reachability test:
+
+- **`oak-api`, `oak-core`, `oak-jcr`**: reachable from a session
+  established at B1 with the principal carrying *less than*
+  `SystemPrincipal` / `AdminPrincipal`. A finding that requires an
+  already-admin session collapses to "the host gave away an admin
+  session", which is §3 item 1.
+- **`oak-security-spi`, default `AuthorizationConfiguration`,
+  `PermissionProvider`**: in-model for any reachable failure mode that
+  results in an effective grant the configured ACLs do not license, or
+  an effective deny that they do. Hidden-item handling is in-model:
+  "system principals gain access except for hidden items that are not
+  exposed on the Oak API" *(documented:
+  `security/permission/default.md`)*.
+- **`oak-authorization-cug`**: in-model only for *read* access
+  restriction; CUG "solely evaluates and enforces read access to
+  regular nodes and properties" *(documented:
+  `security/authorization/cug.md`)*. CUG is disabled by default.
+- **`oak-authorization-principalbased`**: in-model when configured;
+  composes via `CompositeAuthorizationConfiguration` *(documented:
+  `security/authorization.md`)*.
+- **`oak-auth-external`, `oak-auth-ldap`**: in-model for the bridge
+  code that *uses* the IDP — credential handling, session sync,
+  group-membership materialisation into Oak's user-management tree.
+  Out-of-model for the IDP's own correctness (§3 item 2).
+- **NodeStore / BlobStore**: in-model for in-Oak code paths (commit
+  hook, MVCC, secondary indexes); out-of-model for the backend's
+  external behaviour (§3 item 2 / item 3).
+- **`oak-lucene` / `oak-search-elastic`**: in-model for query
+  evaluation against the *visible* tree under the caller's
+  permissions. Index *leakage* — e.g. a search result that surfaces a
+  node path that the caller has no read permission for — is in-model
+  (the search must respect §3 §4 permission scope). The
+  `QueryEngine` is documented to filter by permission as part of
+  result delivery *(inferred — §14 Q7)*.
+- **`oak-run`**: in-model only for in-process logic; "operator runs
+  it with a bad keystore on the file system" is out (§3 item 6).
+
+## §5 Assumptions about the environment
+
+- **JVM / runtime.** Java 11+ at HEAD; the build requires Maven 3.x

Review Comment:
   -> https://github.com/apache/jackrabbit-oak/pull/2927
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to