[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-08-25 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated OAK-5192:
-
Sprint: L10

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.7
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch, Screen 
> Shot 2017-07-03 at 16.50.00.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-08-24 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated OAK-5192:
-
Fix Version/s: (was: 1.7.8)
   1.7.7

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.7
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch, Screen 
> Shot 2017-07-03 at 16.50.00.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-07-03 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated OAK-5192:
-
Attachment: Screen Shot 2017-07-03 at 16.50.00.png

I've added a multiplier parameter so that the _add, reindex, add/delete_ 
workflow can be executed multiple times and see how the index grows over time 
using different configurations.
See the attached graph.

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.8
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch, Screen 
> Shot 2017-07-03 at 16.50.00.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-06-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Fix Version/s: (was: 1.7.3)
   1.7.8

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.8
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-06-09 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated OAK-5192:
-
Attachment: OAK-5192.0.patch

attaching a unit test for testing how different Lucene configurations impact on 
segment store size.

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt, OAK-5192.0.patch
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: nonBinSizeTotal.txt
binSizeTotal.txt
binSize16384.txt
binSize100.txt

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png, binSize100.txt, binSize16384.txt, 
> binSizeTotal.txt, diff.txt.zip, nonBinSizeTotal.txt
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: (was: diff.txt.zip)

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png, diff.txt.zip
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: diff.txt.zip

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png, diff.txt.zip
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-04 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: diff.txt.zip

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png, diff.txt.zip
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-04 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: (was: diff.txt.zip)

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-04 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: diff.txt.zip

Attached a diff ([^diff.txt.zip]) across 1000 subsequent commits of the sites 
longevity test limited to content at {{root/oak:index/lucene}}. Each diff 
consists of the time stamp of the respective revision followed by a list of 
changes. There are fewer than 999 diffs as I left out the empty ones.

These changes account for 7415993 bytes added in total and 7427450 bytes 
removed in total to/from the segment store (that is binaries that went to the 
blob store are excluded from these numbers). 

[~chetanm], is this what you needed?

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>Assignee: Tommaso Teofili
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png, diff.txt.zip
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-05-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Fix Version/s: 1.7.3

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>  Labels: perfomance, scalability
> Fix For: 1.8, 1.7.3
>
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2017-01-17 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Fix Version/s: 1.8

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>  Labels: perfomance, scalability
> Fix For: 1.8
>
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2016-12-22 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Labels: perfomance scalability  (was: perfomance)

> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>  Labels: perfomance, scalability
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-5192) Reduce Lucene related growth of repository size

2016-11-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dürig updated OAK-5192:
---
Attachment: added-bytes-zoom.png

The following plots show added bytes over time in content (upper plot) and 
added bytes over time in index (lower plot). Index is 3 order of magnitudes 
above regular content in terms of number of bytes added.

!added-bytes-zoom.png|width=500!

The pattern with the spike every 40s in the writes to the index is caused by 
Lucene's merging. Switching from {{SerialMergeScheduler}} to 
{{NoMergeScheduler}} flattens the curve out and also reduces the total amount 
of data written by factor 13.



> Reduce Lucene related growth of repository size
> ---
>
> Key: OAK-5192
> URL: https://issues.apache.org/jira/browse/OAK-5192
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, segment-tar
>Reporter: Michael Dürig
>  Labels: perfomance
> Attachments: added-bytes-zoom.png
>
>
> I observed Lucene indexing contributing to up to 99% of repository growth. 
> While the size of the index itself is well inside reasonable bounds, the 
> overall turnover of data being written and removed again can be as much as 
> 99%. 
> In the case of the TarMK this negatively impacts overall system performance 
> due to fast growing number of tar files / segments, bad locality of 
> reference, cache misses/thrashing when looking up segments and vastly 
> prolonged garbage collection cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)