[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-08-10 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6500:
-
   Labels:   (was: candidate_oak_1_6)
Fix Version/s: 1.6.4

Merged to 1.6 with 1804664

> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.8, 1.7.6, 1.6.4
>
> Attachments: OAK-6500-approach-a-v1.patch, 
> OAK-6500-ref-count-v1.patch, OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-08-04 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6500:
-
Fix Version/s: (was: 1.6.4)

> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
>  Labels: candidate_oak_1_6
> Fix For: 1.8, 1.7.6
>
> Attachments: OAK-6500-approach-a-v1.patch, 
> OAK-6500-ref-count-v1.patch, OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-08-03 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6500:
-
Labels: candidate_oak_1_6  (was: )

> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
>  Labels: candidate_oak_1_6
> Fix For: 1.8, 1.7.6, 1.6.4
>
> Attachments: OAK-6500-approach-a-v1.patch, 
> OAK-6500-ref-count-v1.patch, OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-08-01 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6500:
-
Attachment: OAK-6500-ref-count-v1.patch

[patch|^OAK-6500-ref-count-v1.patch] using ref counting approach i.e. #B above

Here we refactor existing {{IndexNode}} class to 2 
* IndexNode - Interface which captures the "api" of existing IndexNode. This 
ensures that code which made use of IndexNode does not change. Note that all 
client code use {{IndexTracker}} to obtain {{IndexNode}} instances
* IndexNodeManager - This class now has the logic present in earlier IndexNode 
class and is used by {{IndexTracker}} to obtain {{IndexNode}} instances and 
pass them to client code

h3. Ref Counting

This refactoring was required as we now need to maintain some state on per 
acquire call. IndexNodeManager and NRTIndex now rely on ref counting support in 
built in 
[IndexReader|https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/IndexReader.html#incRef()]
 to ensure that reader instance gets closes when all references have been 
released. The counting is managed in following way
# At time of creation of IndexReader the count = (0 -> 1)
#* Normal reader instances created once when IndexNodeManager is created have 
the count 1
#* IndexReader for NRTIndex are also 1 when created in NRTIndex
# IndexSearcher creation increments the count to 2 - IndexNodeManager creates 
new IndexSearcher and passes them either MultiReader or normal IndexReader. At 
that time (IndexNodeManager#createReader) the count is cremented once
# Upon each acquire of IndexNode the count is further incremented (2->3)
# Then upon each release of IndexNode the count is decremented by 1 (3 -> 2). 
This compliments #3
# Then for each refresh when a new IndexSearcher is created the count is again 
decremented (2 -> 1). This complements #2
# Then upon close
#* Normal reader instance are closed in IndexNodeManager#close. The close call 
decrements the count to 0 (1 -> 0)
#* IndexReader for NRTIndex - The decrement happens in NRTIndex#createReader 
call where if a new reader is created then decRef is called for previous reader

This approach ensures that reader ref count should eventually reach 0 and thus 
get closed.

h3. Asserting RefCount

In addtion NRTIndex exposes a flag to assert that reader ref counts are zero by 
the time NRTIndex instance is closed. This assert can be enabled via system 
property {{oak.lucene.assertAllResourcesClosed}}. The test setups enable this 
flag to check that all resources are getting cleaned up

[~catholicon] - This patch changes a core component IndexNode. Due to rename 
the diff is tricky. So would be helpful if you can review it once

[~teofili] - Would also be helpful if you can review the patch wrt Lucene api 
usage





> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.8, 1.7.6, 1.6.4
>
> Attachments: OAK-6500-approach-a-v1.patch, 
> OAK-6500-ref-count-v1.patch, OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-07-31 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-6500:
--
Fix Version/s: (was: 1.7.5)
   1.7.6

> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.8, 1.7.6, 1.6.4
>
> Attachments: OAK-6500-approach-a-v1.patch, OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-07-28 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6500:
-
Attachment: OAK-6500-approach-a-v1.patch

[updated patch|^OAK-6500-approach-a-v1.patch] based on approach a. With this 
the open file count comes closer to the start count

{noformat}
Open file count - At start 180
Open file count - Post creation of 100 nodes at /content/a is 297
Open file count - Post async run at /content/a is 300
Open file count - Post creation of 100 nodes at /content/b is 419
Open file count - Post async run at /content/b is 421
Open file count - Post creation of 100 nodes at /content/c is 539
Open file count - Post async run at /content/c is 422
Open file count - Post creation of 1 nodes at /content/d is 423
Open file count - Post async run at /content/d is 301
Open file count - Post creation of 1 nodes at /content/e is 302
Open file count - Post async run at /content/e is 181
Open file count - At end 181
{noformat}

Here the NRTIndex keep a memory of opened IndexReader and closes them upon 
close of NRTIndex.

Downside here is that if for a sync lucene index if say 1000 commits happen 
before AsyncIndexUpdate cycle is triggered then those 1000 reader instances 
would be held in memory

Would now work on #B which would rely on ref counting

> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.8, 1.7.5, 1.6.4
>
> Attachments: OAK-6500-approach-a-v1.patch, OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6500) NRTIndex leaks file handles due to unclosed IndexReader

2017-07-27 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-6500:
-
Attachment: OAK-6500-v1.patch

> NRTIndex leaks file handles due to unclosed IndexReader
> ---
>
> Key: OAK-6500
> URL: https://issues.apache.org/jira/browse/OAK-6500
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.0
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Critical
> Fix For: 1.8, 1.7.5, 1.6.4
>
> Attachments: OAK-6500-v1.patch
>
>
> On some setups under stress it has been seen that NRTIndex leaks file handles 
> over time. 
> Checking with lsof indicates that more than 3 nrt folders per index are being 
> used. However per design there can be max 3 and after system is not in use 
> max 1 should be present.
> {noformat}
> $ lsof -p 9550 | grep '/nrt' | gawk 'match($0, 
> /.*crx-quickstart\/repository\/index\/(.*?)\/\_.*$/, m) { print m[1]; }' | 
> sort | uniq
> cqPageLucene-1501065263331/nrt1501065335930
> cqPageLucene-1501065263331/nrt1501065374667
> cqPageLucene-1501065263331/nrt1501065392492
> cqPageLucene-1501065263331/nrt1501065440955
> cqPageLucene-1501065263331/nrt1501065473286
> cqPageLucene-1501065263331/nrt1501065507345
> slingeventJob-1501065263330/nrt1501065356975
> slingeventJob-1501065263330/nrt1501065373229
> slingeventJob-1501065263330/nrt1501065394142
> slingeventJob-1501065263330/nrt1501065440953
> slingeventJob-1501065263330/nrt1501065473282
> slingeventJob-1501065263330/nrt1501065507342
> versionStoreIndex-1501065263332/nrt1501065335925
> versionStoreIndex-1501065263332/nrt1501065366781
> versionStoreIndex-1501065263332/nrt1501065392490
> versionStoreIndex-1501065263332/nrt1501065441232
> versionStoreIndex-1501065263332/nrt1501065473285
> {noformat}
> Further actually checking index folder indicates that those folder are 
> actually deleted. So some where the file handle is still referring them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)