[jira] [Commented] (OAK-6571) Prefetching the DocumentStore cache using machine learning

JIRA Tue, 22 Aug 2017 08:36:18 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136947#comment-16136947
 ]


Tomek Rękawek commented on OAK-6571:
------------------------------------

I've updated the patch to log all the MongoDB requests in following form:

{noformat}
22.08.2017 17:28:12.843 *INFO* [DocumentNodeStore background read thread (1)] 
org.apache.jackrabbit.oak.plugins.document.cache.prefetch.LoggingPrefetchAlgorithm
 Session summary:
Session: 0:0:0:0:0:0:0:1 [1503415689895] GET /content/picture.jpg HTTP/1.1
Request[1:/apps]
Request[2:/oak:index/verbs]
Request[2:/oak:index/versionStoreIndex]
Request[2:/oak:index/customDataLucene]
Request[2:/content/oak:index]
Request[3:/content/oak:index/]...[3:/content/oak:index0]
Request[0:/]
{noformat}

This can be used to gather the traffic from a real-world instance and analyse 
it using offline tools.

> Prefetching the DocumentStore cache using machine learning
> ----------------------------------------------------------
>
>                 Key: OAK-6571
>                 URL: https://issues.apache.org/jira/browse/OAK-6571
>             Project: Jackrabbit Oak
>          Issue Type: Story
>          Components: cache, documentmk
>            Reporter: Tomek Rękawek
>             Fix For: 1.8
>
>         Attachments: OAK-6571.patch
>
>
> The idea is that we can analyse the series of requests made by the 
> DocumentStore, eg.:
> /content/site/jcr:content
> /content/site/jcr:content/left-column
> /content/site/jcr:content/left-column/item1
> /content/site/jcr:content/left-column/item2
> to predict the future requests and prefetch them. This way we can limit the 
> number of required requests, the connection latency, etc.
> In order to group the requests together, we can use the thread name as a 
> common property. For instance, if Oak is used with Sling, then a single HTTP 
> request usually is served by a single thread and it's name contains the HTTP 
> request line.
> Implementing this story will require intercepting the MongoDB/RDB requests 
> made by the DocumentStore and preparing an algorithm analysing and predicting 
> the future calls. The attached patch [^OAK-6571.patch] contains:
> * a proposal of interface which may be used to join these two parts,
> * a very early integration with the DocumentMK,
> * a naive implementation of the algorithm, which simply logs the request 
> sequences.
> We can start with a simple algorithm trying to exact match the current 
> requests to the already existing sequence and it's not enough look for more 
> sophisticated mechanism.
> Resources:
> * [Intelligent web caching using machine learning 
> methods|http://www.nnw.cz/doi/2011/NNW.2011.21.025.pdf]
> * [Hidden Markov Model|https://en.wikipedia.org/wiki/Hidden_Markov_model]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (OAK-6571) Prefetching the DocumentStore cache using machine learning

Reply via email to