[ 
https://issues.apache.org/jira/browse/OAK-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136900#comment-16136900
 ] 

Tommaso Teofili commented on OAK-6571:
--------------------------------------

with enough data I think the RNN should implicitly be figuring out the right 
weights and probabilities for such patterns to be used, on the other hand we 
might be hinting them if we think they are generally meaningful.
In general this might not be the case, imagine having a media site with the 
french content under /content/fr/ and the english one under /content/en, users 
may have different interests and therefore access patterns depending on 
different aspects (e.g. local politics, local trending topics, etc.).
Perhaps we can start without such hand crafted patterns and decide whether it 
makes sense to add them afterwards depending on the accuracy of the predictions 
(e.g. if we can't improve it beyond a certain barrier).

> Prefetching the DocumentStore cache using machine learning
> ----------------------------------------------------------
>
>                 Key: OAK-6571
>                 URL: https://issues.apache.org/jira/browse/OAK-6571
>             Project: Jackrabbit Oak
>          Issue Type: Story
>          Components: cache, documentmk
>            Reporter: Tomek Rękawek
>             Fix For: 1.8
>
>         Attachments: OAK-6571-api.patch
>
>
> The idea is that we can analyse the series of requests made by the 
> DocumentStore, eg.:
> /content/site/jcr:content
> /content/site/jcr:content/left-column
> /content/site/jcr:content/left-column/item1
> /content/site/jcr:content/left-column/item2
> to predict the future requests and prefetch them. This way we can limit the 
> number of required requests, the connection latency, etc.
> In order to group the requests together, we can use the thread name as a 
> common property. For instance, if Oak is used with Sling, then a single HTTP 
> request usually is served by a single thread and it's name contains the HTTP 
> request line.
> Implementing this story will require intercepting the MongoDB/RDB requests 
> made by the DocumentStore and preparing an algorithm analysing and predicting 
> the future calls. The attached patch [^OAK-6571-api.patch] contains a 
> proposal of interface which may be used to join these two parts.
> We can start with a simple algorithm trying to exact match the current 
> requests to the already existing sequence and it's not enough look for more 
> sophisticated mechanism.
> Resources:
> * [Intelligent web caching using machine learning 
> methods|http://www.nnw.cz/doi/2011/NNW.2011.21.025.pdf]
> * [Hidden Markov Model|https://en.wikipedia.org/wiki/Hidden_Markov_model]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to