Thomas Mueller created OAK-8523:
-----------------------------------
Summary: Best Practices - Property Value Length Limit
Key: OAK-8523
URL: https://issues.apache.org/jira/browse/OAK-8523
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core, jcr
Reporter: Thomas Mueller
Right now, Oak supports very large properties (e.g. String). But 1 MB (or
larger) properties are problematic in multiple areas like indexing. It is more
important for software-as-a-service, where we need to guarantee SLOs, but it
also helps other cases. So we should:
* (1) Document best practises, e.g. "Property values should be smaller than 100
KB".
* (2) Introduce "softLimit" and "hardLimit", where softLimit is e.g. 100 KB and
hardLimit is configurable, and (initially) by default Integer.MAX_VALUE.
Setting the hard limits to a lower value by default is problematic, because it
can break existing applications. With default value infinity, customers can set
lower limits e.g. in tests first, and once they are happy, in production as
well.
* (3) Log a warning if a property is larger than "softLimit". To avoid logging
many warnings (if there are many such properties) we then set softLimit =
softLimit * 1.1 (reset to 100 KB in the next repository start). Logging is
needed to know what _exactly_ is broken (path, stack trace of the actual
usage...)
* (4) Add a metric (monitoring) for detected large properties. Just logging
warnings might not be enough.
* (5) Throttling: we could add flow control (pauses; Thread.sleep) after
violations, to improve isolation (to prevent affecting other threads that don't
violate the contract).
* (6) We could expose the violation info in the session, so a framework could
check that data after executing custom code, and add more info (e.g. log).
* (7) If larger than the configurable hardLimit, fail the commit or reject
setProperty (throw an exception).
* (8) At some point, in a new Oak version, change the default value for
hardLimit to some reasonable number, e.g. 1 MB.
The "property length" is just one case. There are multiple candidates:
* Number of properties for a node
* Number of elements for multi-valued properties
* Total size of a node (including inlined properties)
* Number of direct child nodes for orderable child nodes
* Number of direct child nodes for non-orderable child nodes
* Size of transaction
* Adding observations listeners that listen for all changes (global listeners)
For those cases, new Jira issue should be made.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)