[ 
https://issues.apache.org/jira/browse/OAK-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150039#comment-17150039
 ] 

Thomas Mueller edited comment on OAK-8523 at 7/2/20, 7:58 AM:
--------------------------------------------------------------

> webpage-as-primary-key-cache

Sorry I don't know what that means.

> we didn't go down this road because lookup in the 
> webpage-as-primary-key-cache would need a (full-text?) query

There are two ways:
 * if you use a JCR "reference", e.g. using [Node.setProperty(String, 
Node)|https://docs.adobe.com/docs/en/spec/jsr170/javadocs/jcr-2.0/javax/jcr/Node.html#setProperty(java.lang.String,%20javax.jcr.Node)],
 then you can get the list of references using 
[Node.getReferences|https://docs.adobe.com/docs/en/spec/jsr170/javadocs/jcr-2.0/javax/jcr/Node.html#getReferences()].
 Because this type of references is actually an out-of-the-box feature of JCR. 
Internally it is running a query btw.
 * The second option is to use a query (not a fulltext query, just a regular 
query).

> OTOH, with resource-as-primary-key-cache, we just traverse right to the 
> resource we need to find the reference for and read the prop - no search 
> required.

On the other hand, you have many downsides, like the risk of having too many 
references.

> designed when IDs were considered evil

Well you can also store paths. In which case you can't use the out-of-the-box 
feature and have to run the query yourself.

So, in summary, I think the warning for large string properties and many 
entries in a multi-value property are justified. It is easy to make mistakes, 
in which case performance will be bad, and out-of-memory can occur. In my view, 
the added complexity to run a query, or to use uuids, is worth the trouble to 
avoid those problems.


was (Author: tmueller):
> webpage-as-primary-key-cache

Sorry I don't know what that means.

> we didn't go down this road because lookup in the 
> webpage-as-primary-key-cache would need a (full-text?) query

There are two ways:
* if you use a JCR "reference", e.g. using [Node.setProperty(String, 
Node)|setProperty(java.lang.String name, Node value)], then you can get the 
list of references using 
[Node.getReferences|https://docs.adobe.com/docs/en/spec/jsr170/javadocs/jcr-2.0/javax/jcr/Node.html#getReferences()].
 Because this type of references is actually an out-of-the-box feature of JCR. 
Internally it is running a query btw.
* The second option is to use a query (not a fulltext query, just a regular 
query).

>  OTOH, with resource-as-primary-key-cache, we just traverse right to the 
> resource we need to find the reference for and read the prop - no search 
> required.

On the other hand, you have many downsides, like the risk of having too many 
references.

> designed when IDs were considered evil 

Well you can also store paths. In which case you can't use the out-of-the-box 
feature and have to run the query yourself.

So, in summary, I think the warning for large string properties and many 
entries in a multi-value property are justified. It is easy to make mistakes, 
in which case performance will be bad, and out-of-memory can occur. In my view, 
the added complexity to run a query, or to use uuids, is worth the trouble to 
avoid those problems.

> Best Practices - Property Value Length Limit
> --------------------------------------------
>
>                 Key: OAK-8523
>                 URL: https://issues.apache.org/jira/browse/OAK-8523
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, jcr
>            Reporter: Thomas Mueller
>            Priority: Major
>
> Right now, Oak supports very large properties (e.g. String). But 1 MB (or 
> larger) properties are problematic in multiple areas like indexing. It is 
> more important for software-as-a-service, where we need to guarantee SLOs, 
> but it also helps other cases. So we should:
> * (1) Document best practises, e.g. "Property values should be smaller than 
> 100 KB".
> * (2) Introduce "softLimit" and "hardLimit", where softLimit is e.g. 100 KB 
> and hardLimit is configurable, and (initially) by default Integer.MAX_VALUE. 
> Setting the hard limits to a lower value by default is problematic, because 
> it can break existing applications. With default value infinity, customers 
> can set lower limits e.g. in tests first, and once they are happy, in 
> production as well.
> * (3) Log a warning if a property is larger than "softLimit". To avoid 
> logging many warnings (if there are many such properties) we then set 
> softLimit = softLimit * 1.1 (reset to 100 KB in the next repository start). 
> Logging is needed to know what _exactly_ is broken (path, stack trace of the 
> actual usage...)
> * (4) Add a metric (monitoring) for detected large properties. Just logging 
> warnings might not be enough.
> * (5) Throttling: we could add flow control (pauses; Thread.sleep) after 
> violations, to improve isolation (to prevent affecting other threads that 
> don't violate the contract).
> * (6) We could expose the violation info in the session, so a framework could 
> check that data after executing custom code, and add more info (e.g. log).
> * (7) If larger than the configurable hardLimit, fail the commit or reject 
> setProperty (throw an exception).
> * (8) At some point, in a new Oak version, change the default value for 
> hardLimit to some reasonable number, e.g. 1 MB.
> The "property length" is just one case. There are multiple candidates:
>         
> * Number of properties for a node
> * Number of elements for multi-valued properties
> * Total size of a node (including inlined properties)
> * Number of direct child nodes for orderable child nodes
> * Number of direct child nodes for non-orderable child nodes
> * Size of transaction
> * Adding observations listeners that listen for all changes (global listeners)
> For those cases, new Jira issue should be made.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to