[
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996447#comment-15996447
]
Michael Dürig commented on OAK-5192:
------------------------------------
In our standard longevity test (sites) I analysed how much content is added
where. The tables shows the percentages of content added per top level paths
within 6000 commits of that test. Here content added consists of the number of
bytes written to property values in the segment store (excluding the ones that
go the data store).
||Path||Bytes||Percentage|
|oak:index|114632179|70.55699686|
|jcr:system|28013436|17.24248752|
|var|12642363|7.781472657|
|content/geometrixx-media|4734858|2.914341889|
|:async|1543728|0.950176579|
|content/usergenerated|557260|0.34299786|
|home|343662|0.211526631|
|total|162467486|100|
||Path||Bytes||Percentage||
|oak:index/lucene|48047720|41.91468785|
|oak:index/cqPageLucene|32912350|28.71126614|
|oak:index/ntBaseLucene|17330234|15.11812316|
|oak:index/versionStoreIndex|7982297|6.963399867|
|oak:index/uuid|6804629|5.936054832|
|oak:index/damAssetLucene|1029240|0.897863069|
|oak:index/slingResourceType|259572|0.226439035|
|oak:index/authorizables|102744|0.089629283|
|oak:index/slingeventJob|72571|0.063307704|
|oak:index/reference|25310|0.022079315|
|oak:index/cqProjectLucene|23672|0.020650397|
|oak:index/cqMobileAppLucene|18872|0.016463091|
|oak:index/cqTemplate|13404|0.011693052|
|oak:index/nodetype|5588|0.004874722|
|oak:index/counter|3976|0.003468485|
| |114632179|100|
For posterity this is the [script-oak|https://github.com/mduerig/script-oak]
script used to extract above data:
{code}
import $ivy.`michid:script-oak:latest.integration`
import michid.script.oak._
import michid.script.oak.nodestore.Changes._
import michid.script.oak.nodestore.Changes
import michid.script.oak.nodestore.Projection
implicit val fs = fileStoreAnalyser()
val sroot = fs.getNode()
val root = sroot.getChildNode("root")
val nodes = (root.analyse).nodes.map(_.name).map("root/" + _)
val ps = nodes.map(Projection.apply)
val cs = ps.map(fs.changes)
val ts = cs.map(_.map(c => Changes.turnOver(c._1)))
val tpn = (nodes zip ts).toMap
val added = tpn.mapValues(_.map(_.addedContent))
val addedContent = added.mapValues(_.sum)
val addedSize = addedContent.toList.sortBy(-_._2)
{code}
> Reduce Lucene related growth of repository size
> -----------------------------------------------
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene, segment-tar
> Reporter: Michael Dürig
> Assignee: Tommaso Teofili
> Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth.
> While the size of the index itself is well inside reasonable bounds, the
> overall turnover of data being written and removed again can be as much as
> 99%.
> In the case of the TarMK this negatively impacts overall system performance
> due to fast growing number of tar files / segments, bad locality of
> reference, cache misses/thrashing when looking up segments and vastly
> prolonged garbage collection cycles.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)