[jira] [Updated] (OAK-6571) Prefetching the DocumentStore cache using machine learning

JIRA Tue, 22 Aug 2017 04:46:28 -0700

     [ 
https://issues.apache.org/jira/browse/OAK-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tomek Rękawek updated OAK-6571:
-------------------------------
    Description: 
The idea is that we can analyse the series of requests made by the 
DocumentStore, eg.:

/content/site/jcr:content
/content/site/jcr:content/left-column
/content/site/jcr:content/left-column/item1
/content/site/jcr:content/left-column/item2

to predict the future requests and prefetch them. This way we can limit the 
number of required requests, the connection latency, etc.

In order to group the requests together, we can use the thread name as a common 
property. For instance, if Oak is used with Sling, then a single HTTP request 
usually is served by a single thread and it's name contains the HTTP request 
line.

Implementing this story will require intercepting the MongoDB/RDB requests made 
by the DocumentStore and preparing an algorithm analysing and predicting the 
future calls. The attached patch contains a proposal of interface which may be 
used to join these two parts.

We can start with a simple algorithm trying to exact match the current requests 
to the already existing sequence and it's not enough look for more 
sophisticated mechanism.

Resources:
* [Intelligent web caching using machine learning 
methods|http://www.nnw.cz/doi/2011/NNW.2011.21.025.pdf]
* [Hidden Markov Model|https://en.wikipedia.org/wiki/Hidden_Markov_model]

  was:
The idea is that we can analyse the series of requests made by the 
DocumentStore, eg.:

/content/site/jcr:content
/content/site/jcr:content/left-column
/content/site/jcr:content/left-column/item1
/content/site/jcr:content/left-column/item2

to predict the future requests and prefetch them. This way we can limit the 
number of required requests, the connection latency, etc.

In order to group the requests together, we can use the thread name as a common 
property. For instance, if Oak is used with Sling, then a single HTTP request 
usually is served by a single thread and it's name contains the HTTP request 
line.

Implementing this story will require intercepting the MongoDB/RDB requests made 
by the DocumentStore and preparing an algorithm analysing and predicting the 
future calls. The attached patch contains a proposal of interface which may be 
used to join these two parts.

We can start with a simple algorithm trying to exact match the current requests 
to the already existing sequence and it's not enough look for more 
sophisticated mechanism.

Resources:
* [Intelligent web caching using machine learning 
methods|http://www.nnw.cz/doi/2011/NNW.2011.21.025.pdf|
* [Hidden Markov Model|https://en.wikipedia.org/wiki/Hidden_Markov_model]


> Prefetching the DocumentStore cache using machine learning
> ----------------------------------------------------------
>
>                 Key: OAK-6571
>                 URL: https://issues.apache.org/jira/browse/OAK-6571
>             Project: Jackrabbit Oak
>          Issue Type: Story
>          Components: cache, documentmk
>            Reporter: Tomek Rękawek
>             Fix For: 1.8
>
>
> The idea is that we can analyse the series of requests made by the 
> DocumentStore, eg.:
> /content/site/jcr:content
> /content/site/jcr:content/left-column
> /content/site/jcr:content/left-column/item1
> /content/site/jcr:content/left-column/item2
> to predict the future requests and prefetch them. This way we can limit the 
> number of required requests, the connection latency, etc.
> In order to group the requests together, we can use the thread name as a 
> common property. For instance, if Oak is used with Sling, then a single HTTP 
> request usually is served by a single thread and it's name contains the HTTP 
> request line.
> Implementing this story will require intercepting the MongoDB/RDB requests 
> made by the DocumentStore and preparing an algorithm analysing and predicting 
> the future calls. The attached patch contains a proposal of interface which 
> may be used to join these two parts.
> We can start with a simple algorithm trying to exact match the current 
> requests to the already existing sequence and it's not enough look for more 
> sophisticated mechanism.
> Resources:
> * [Intelligent web caching using machine learning 
> methods|http://www.nnw.cz/doi/2011/NNW.2011.21.025.pdf]
> * [Hidden Markov Model|https://en.wikipedia.org/wiki/Hidden_Markov_model]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (OAK-6571) Prefetching the DocumentStore cache using machine learning

Reply via email to