[ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996362#comment-15996362
 ] 

Thomas Mueller commented on OAK-5192:
-------------------------------------

As discussed with [~teofili] today:

* We could use re-enable compression (which also affects write/read 
performance). This was disabled in OAK-1737.

* One idea is to store the updated index in the repo only once every minute 
instead of once every 5 seconds. To easily measure how this affects space 
usage, we might be able to use NRT indexing (we might need to modify NRT 
indexes store temporary files). Then reduce async index time to once every 
minute instead of once every 5 seconds. The final solution might require more 
work, for example merge the NRT indexes instead of indexing from scratch. Not 
sure how NRT indexes work right now.

> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
>                 Key: OAK-5192
>                 URL: https://issues.apache.org/jira/browse/OAK-5192
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, segment-tar
>            Reporter: Michael Dürig
>            Assignee: Tommaso Teofili
>              Labels: perfomance, scalability
>             Fix For: 1.8, 1.7.3
>
>         Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to