[
https://issues.apache.org/jira/browse/OAK-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136947#comment-16136947
]
Tomek Rękawek commented on OAK-6571:
------------------------------------
I've updated the patch to log all the MongoDB requests in following form:
{noformat}
22.08.2017 17:28:12.843 *INFO* [DocumentNodeStore background read thread (1)]
org.apache.jackrabbit.oak.plugins.document.cache.prefetch.LoggingPrefetchAlgorithm
Session summary:
Session: 0:0:0:0:0:0:0:1 [1503415689895] GET /content/picture.jpg HTTP/1.1
Request[1:/apps]
Request[2:/oak:index/verbs]
Request[2:/oak:index/versionStoreIndex]
Request[2:/oak:index/customDataLucene]
Request[2:/content/oak:index]
Request[3:/content/oak:index/]...[3:/content/oak:index0]
Request[0:/]
{noformat}
This can be used to gather the traffic from a real-world instance and analyse
it using offline tools.
> Prefetching the DocumentStore cache using machine learning
> ----------------------------------------------------------
>
> Key: OAK-6571
> URL: https://issues.apache.org/jira/browse/OAK-6571
> Project: Jackrabbit Oak
> Issue Type: Story
> Components: cache, documentmk
> Reporter: Tomek Rękawek
> Fix For: 1.8
>
> Attachments: OAK-6571.patch
>
>
> The idea is that we can analyse the series of requests made by the
> DocumentStore, eg.:
> /content/site/jcr:content
> /content/site/jcr:content/left-column
> /content/site/jcr:content/left-column/item1
> /content/site/jcr:content/left-column/item2
> to predict the future requests and prefetch them. This way we can limit the
> number of required requests, the connection latency, etc.
> In order to group the requests together, we can use the thread name as a
> common property. For instance, if Oak is used with Sling, then a single HTTP
> request usually is served by a single thread and it's name contains the HTTP
> request line.
> Implementing this story will require intercepting the MongoDB/RDB requests
> made by the DocumentStore and preparing an algorithm analysing and predicting
> the future calls. The attached patch [^OAK-6571.patch] contains:
> * a proposal of interface which may be used to join these two parts,
> * a very early integration with the DocumentMK,
> * a naive implementation of the algorithm, which simply logs the request
> sequences.
> We can start with a simple algorithm trying to exact match the current
> requests to the already existing sequence and it's not enough look for more
> sophisticated mechanism.
> Resources:
> * [Intelligent web caching using machine learning
> methods|http://www.nnw.cz/doi/2011/NNW.2011.21.025.pdf]
> * [Hidden Markov Model|https://en.wikipedia.org/wiki/Hidden_Markov_model]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)