[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-10-16 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952844#comment-16952844
 ] 

Thomas Mueller commented on OAK-7947:
-

Let's wait with backport until we have analyzed the issue and looked at 
alternatives.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-10-16 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952589#comment-16952589
 ] 

Julian Reschke commented on OAK-7947:
-

If we wanted to backport this to 1.10, we could:

1) backout the changes for OAK-8437 and OAK-8046
2) revert r1851052
3) merge r1851022 and r1852007
4) backport again OAK-8046 and OAK-8437


> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-04-30 Thread Julian Reschke (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830210#comment-16830210
 ] 

Julian Reschke commented on OAK-7947:
-

trunk: (1.12.0) [r1852007|http://svn.apache.org/r1852007] 
[r1851022|http://svn.apache.org/r1851022] 
[r1850826|http://svn.apache.org/r1850826] 
[r1850231|http://svn.apache.org/r1850231] 
[r1850229|http://svn.apache.org/r1850229] 
[r1850163|http://svn.apache.org/r1850163] 
[r1849465|http://svn.apache.org/r1849465]
1.10: (1.10.0) [r1851052|http://svn.apache.org/r1851052] (1.10.0) 
[r1850826|http://svn.apache.org/r1850826] 
[r1850231|http://svn.apache.org/r1850231] 
[r1850229|http://svn.apache.org/r1850229] 
[r1850163|http://svn.apache.org/r1850163] 
[r1849465|http://svn.apache.org/r1849465]

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12.0
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-02-05 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760701#comment-16760701
 ] 

Thomas Mueller commented on OAK-7947:
-

> Would it be feasible to add a way for users to trigger downloading of indexes?

[~mduerig] yes, we can add something like this. One idea is to load all indexes 
except those marked as deprecated, or we can add a new flag (e.g. "lazyLoad"). 
I suggest we wait with this until lazy loading works as expected, so that we 
are sure it works as expected. (If we add such a feature very early on, there 
is a risk that lazy loading isn't well tested on real problems, as it's rare. 
I'm not suggesting we don't add unit tests, but here it's a bit hard to come up 
with unit tests that match reality.)

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-02-05 Thread JIRA


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760709#comment-16760709
 ] 

Michael Dürig commented on OAK-7947:


{quote}I suggest we wait with this
{quote}
Ack. I'll bring it up again if and once required.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750952#comment-16750952
 ] 

Michael Dürig commented on OAK-7947:


Would it be feasible to add a way for users to trigger downloading of indexes? 
This could be used to e.g. start downloading indexes in the background or for 
pre-warming instances before switching them live.

Arguably this is a topic for a separate issue and lets follow up in a one if 
this is feasible at all. If not, lets forget it.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-24 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750947#comment-16750947
 ] 

Tommaso Teofili commented on OAK-7947:
--

+1, thanks Thomas, I think it sounds like the most reasonable compromise for 
the current situation.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-24 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750944#comment-16750944
 ] 

Thomas Mueller commented on OAK-7947:
-

http://svn.apache.org/r1852007 (trunk)
includes the LuceneIndexMBeanImpl patch above (so, index update doesn't 
download the indexes to get stats, except if the system property is set).

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-24 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750890#comment-16750890
 ] 

Thomas Mueller commented on OAK-7947:
-

The following addition doesn't download the indexes (only updates the stats for 
the indexes that are already downloaded, that is, only for those that are shown 
in the JMX bean table).

Maybe we could have some "middle ground", that is, by default download the 
indexes during the index upgrade cycle, but only those that aren't deprecated. 
That way, index update only doesn't cause large deprecated indexes to be 
downloaded. For non-deprecated indexes, I think it's actually good to download 
them quite early on, and the index update mechanism sounds like a good 
mechanism for that.

{noformat}
--- 
src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexMBeanImpl.java
  (revision 1851902)
+++ 
src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexMBeanImpl.java
  (working copy)
@@ -93,6 +93,8 @@
 
 public class LuceneIndexMBeanImpl extends AnnotatedStandardMBean implements 
LuceneIndexMBean {
 
+private static final boolean LOAD_INDEX_FOR_STATS = 
Boolean.parseBoolean(System.getProperty("oak.lucene.LoadIndexForStats", 
"false"));
+
 private final Logger log = LoggerFactory.getLogger(getClass());
 private final IndexTracker indexTracker;
 private final NodeStore nodeStore;
@@ -381,11 +383,21 @@
 
 @Override
 public String getSize(String indexPath) throws IOException {
+if (!LOAD_INDEX_FOR_STATS) {
+if (!indexTracker.getIndexNodePaths().contains(indexPath)) {
+return "-1";
+}
+}
 return String.valueOf(getIndexStats(indexPath).indexSize);
 }
 
 @Override
 public String getDocCount(String indexPath) throws IOException {
+if (!LOAD_INDEX_FOR_STATS) {
+if (!indexTracker.getIndexNodePaths().contains(indexPath)) {
+return "-1";
+}
+}
 return String.valueOf(getIndexStats(indexPath).numDocs);
 }
{noformat}

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-24 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750871#comment-16750871
 ] 

Thomas Mueller commented on OAK-7947:
-

[~teofili] [~catholicon] maybe it's alright if the async index update downloads 
all the index files (even if the index wasn't updated or used so far), what do 
you think?

What about adding a system property so behavior (basically OAK-7893) this can 
be disabled? If I do that and set the system property, then at startup only the 
indexes that I would expect are downloaded.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-24 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750849#comment-16750849
 ] 

Thomas Mueller commented on OAK-7947:
-

Index copying takes place here:
{noformat}
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory.(CopyOnReadDirectory.java:83)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier.wrapForRead(IndexCopier.java:124)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createReader(DefaultIndexReaderFactory.java:97)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createReader(DefaultIndexReaderFactory.java:85)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createMountedReaders(DefaultIndexReaderFactory.java:67)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.reader.DefaultIndexReaderFactory.createReaders(DefaultIndexReaderFactory.java:60)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexNodeManager.open(LuceneIndexNodeManager.java:72)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.findIndexNode(IndexTracker.java:243)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:212)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexMBeanImpl.getIndexStats(LuceneIndexMBeanImpl.java:143)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexMBeanImpl.getDocCount(LuceneIndexMBeanImpl.java:389)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexStatsUpdateCallback.done(LuceneIndexStatsUpdateCallback.java:64)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.search.CompositePropertyUpdateCallback.done(CompositePropertyUpdateCallback.java:53)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:157)
 [org.apache.jackrabbit.oak-lucene:1.12.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:397) 
[org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:59) 
[org.apache.jackrabbit.oak-store-spi:1.9.10.R1845889]
at 
org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:55) 
[org.apache.jackrabbit.oak-store-spi:1.9.10.R1845889]
at 
org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:728)
 [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:573)
 [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT]
at 
org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:432)
 [org.apache.jackrabbit.oak-core:1.10.0.SNAPSHOT]
at 
org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:347)
 [org.apache.sling.commons.scheduler:2.7.2]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
[org.apache.sling.commons.scheduler:2.7.2]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

It looks like this is relatively new code, due to OAK-7893. Calculating index 
statistics right now causes indexes to be downloaded. This happens for every 
Lucene index, at every index update (whether or not a specific index was 
changed).



> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all 

[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-23 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750816#comment-16750816
 ] 

Thomas Mueller commented on OAK-7947:
-

New patch OAK-7947_v5.patch passes the tests... But unfortunately it doesn't 
seem to lazily load the indexes I would expect. Need to further analyze.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, OAK-7947_v5.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-14 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742827#comment-16742827
 ] 

Thomas Mueller commented on OAK-7947:
-

Reverted by [~teofili] on Friday, 2019-01-11, in
http://svn.apache.org/r1851022 (trunk)
http://svn.apache.org/r1851052 (1.10 branch)

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.12, 1.11.0
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-11 Thread Julian Reschke (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740296#comment-16740296
 ] 

Julian Reschke commented on OAK-7947:
-

trunk: [r1851022|http://svn.apache.org/r1851022] 
[r1850826|http://svn.apache.org/r1850826] 
[r1850231|http://svn.apache.org/r1850231] 
[r1850229|http://svn.apache.org/r1850229] 
[r1850163|http://svn.apache.org/r1850163] 
[r1849465|http://svn.apache.org/r1849465]
1.10: [r1850826|http://svn.apache.org/r1850826] 
[r1850231|http://svn.apache.org/r1850231] 
[r1850229|http://svn.apache.org/r1850229] 
[r1850163|http://svn.apache.org/r1850163] 
[r1849465|http://svn.apache.org/r1849465]


> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.9.14
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-09 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737972#comment-16737972
 ] 

Thomas Mueller commented on OAK-7947:
-

http://svn.apache.org/r1850826 (bugfix)

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.9.14
>
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-03 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733008#comment-16733008
 ] 

Thomas Mueller commented on OAK-7947:
-

http://svn.apache.org/r1850231 (trunk; related changes)

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-03 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732988#comment-16732988
 ] 

Thomas Mueller commented on OAK-7947:
-

http://svn.apache.org/r1850229 (trunk)

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-03 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732950#comment-16732950
 ] 

Thomas Mueller commented on OAK-7947:
-

[~catholicon] could you review OAK-7947_v4.patch please? I added a feature flag 
to disable lazy loading. If you don't have time to review right now, not a 
problem (I think it's fine to commit it before the review).


> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> OAK-7947_v4.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-03 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732926#comment-16732926
 ] 

Thomas Mueller commented on OAK-7947:
-

It looks like tracker.acquireIndexNode not only acquires the index node, but 
also puts it into the index tracker "indices" map.

To make the LucenePropertyIndexTest.reindexWithCOWWithoutIndexPath test pass, 
there are two options:
* either change the query so the nodetype index can't be used, for example to 
"select * from [mix:title] where [jcr:title] = 'x'", or
* don't use LazyLuceneIndexNode (or call getIndexNode() in its constructor)

To make the SynchronousPropertyIndexTest tests pass:

* either (in the tests) call runAsyncIndex() after creating the index 
definitions, or
* don't use LazyLuceneIndexNode (or call getIndexNode() in its constructor)

So the patch does change behavior: the index will only be available if the 
indexing cycle is run. I think that's acceptable, so changing the tests is fine 
I think.


> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-03 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732870#comment-16732870
 ] 

Thomas Mueller commented on OAK-7947:
-

Thanks [~reschke]! And I thought I ran the tests...

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-02 Thread Julian Reschke (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731936#comment-16731936
 ] 

Julian Reschke commented on OAK-7947:
-

trunk: [r1850163|http://svn.apache.org/r1850163] 
[r1849465|http://svn.apache.org/r1849465]

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2019-01-02 Thread Julian Reschke (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731912#comment-16731912
 ] 

Julian Reschke commented on OAK-7947:
-

Test failure:

 {noformat}
[ERROR] Failures:
[ERROR]   LucenePropertyIndexTest.reindexWithCOWWithoutIndexPath:2497
[ERROR]   SynchronousPropertyIndexTest.nodeTypeIndexing:445
Expected: a string containing "/oak:index/foo"
 but: was "[oak:TestSuperType] as [oak:TestSuperType] /* nodeType 
Filter(query=explain select * from [oak:TestSuperType], path=*) */"
[ERROR]   SynchronousPropertyIndexTest.nodeType_mixins:465
Expected: a string containing "/oak:index/foo"
 but: was "[oak:TestMixA] as [oak:TestMixA] /* nodeType 
Filter(query=explain select * from [oak:TestMixA], path=*) */"
[ERROR]   
SynchronousPropertyIndexTest.nonRootIndex:369->AbstractQueryTest.assertQuery:288->AbstractQueryTest.assertQuery:310->AbstractQueryTest.assertQuery:316->AbstractQueryTest.assertResult:323
 Expected path /content/a not found, got []
[ERROR]   
SynchronousPropertyIndexTest.nonUniqueIndex:271->AbstractQueryTest.assertQuery:288->AbstractQueryTest.assertQuery:310->AbstractQueryTest.assertQuery:316->AbstractQueryTest.assertResult:323
 Expected path /a not found, got []
[ERROR]   SynchronousPropertyIndexTest.queryPlan:330
Expected: a string containing "sync:(foo[jcr:content/foo] bar)"
 but: was "[nt:base] as [nt:base] /* no-index
  where [nt:base].[jcr:content/foo] = 'bar' */"
[ERROR]   
SynchronousPropertyIndexTest.relativePropertyTransform:349->AbstractQueryTest.assertQuery:288->AbstractQueryTest.assertQuery:310->AbstractQueryTest.assertQuery:316->AbstractQueryTest.assertResult:323
 Expected path /a not found, got []
[INFO]
[ERROR] Tests run: 858, Failures: 7, Errors: 0, Skipped: 19

 {noformat}

Reverting change for now.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-21 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726702#comment-16726702
 ] 

Thomas Mueller commented on OAK-7947:
-

The minimal set of changes (just IndexTracker and LucenePropertyIndex) are 
committed:
http://svn.apache.org/r1849465
so those should make it into Oak 1.9.14.
Other changes to follow next year.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-19 Thread Vikas Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725500#comment-16725500
 ] 

Vikas Saurabh commented on OAK-7947:


[~tmueller], v3 looks fine to me. Adding a few comment though:
* I think we should get v2::point3 and point4 (log in oak streaming index file 
and avoid getNumDocs in case there's an entry count set)
* {quote}// already released{quote} should we add a warn here - I don't think 
multiple release calls are expected
* {quote}// ...I don't think this is ever called concurrently{quote} I agree 
that methods on this would not be called concurrently. So, can we possibly 
simplify locking and simply add {{synchronized}} to the method itself?

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-19 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725114#comment-16725114
 ] 

Thomas Mueller commented on OAK-7947:
-

[OAK-7947_v3.patch|https://issues.apache.org/jira/secure/attachment/12952368/OAK-7947_v3.patch]
 contains only the really required changes. [~catholicon] could you please 
review it? I will then commit it.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, OAK-7947_v3.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-19 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725071#comment-16725071
 ] 

Thomas Mueller commented on OAK-7947:
-

I will not fix the TODOs in the patch, mainly add synchronization, and will 
then verify that code coverage is fine. Not sure if it's easy to add a unit 
test; an integration test is probably simpler.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-19 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725067#comment-16725067
 ] 

Thomas Mueller commented on OAK-7947:
-

Thanks [~catholicon]! I attached a new patch, 
[OAK-7947_v2.patch|https://issues.apache.org/jira/secure/attachment/12952362/OAK-7947_v2.patch]
 that contains the changes that are needed and make sense:
* IndexTracker.java: now checks that there is a child node ":index-definition". 
So for a new index, it should now not return the definition.
* LucenePropertyIndex: returns LazyLuceneIndexNode instead of LuceneIndexNode. 
This is needed, otherwise acquireIndexNode() is called even if getIndexNode 
isn't called. (And acquireIndexNode downloads the index binaries.)
* OakStreamingIndexFile: A simple change to log the directory name as well. 
(Not strictly needed, but very useful).
* FulltextIndexPlanner: Only call getNumDocs() if the index definition doesn't 
contain a property "entryCount". (Not strictly needed, but should reduce reads).

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, OAK-7947_v2.patch, 
> lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-19 Thread Vikas Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724965#comment-16724965
 ] 

Vikas Saurabh commented on OAK-7947:


Attached a zip -  [^lucene-index-open-access.zip] which contains:
* logging-directory.patch - a patch that adds a logging {{Directory}} 
implementation
* open-close-dir-calls.txt - all calls that the patch logged for a 11G 
damAssetLucene index (same one I listed above)
* open-dir-calls.txt - all calls to simply open the index
* close-dir-calls.txt - calls to close the index

I few things that were quite interesting:
* *All* index files were read although mostly only a few reads were incurred
* seek were only incurred on {{.tim}}, {{.tip}} and {{.cfs}} files - {{.cfs}} 
files tended to be in 100MB range
* seeks in {{.tim}} and {{.cfs}} went backwards too - so they could require 
opening input stream multiple times
* only a few reads occur even after a seek

(there could be other useful patterns to find as well)

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch, lucene-index-open-access.zip
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-11 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716524#comment-16716524
 ] 

Thomas Mueller commented on OAK-7947:
-

> The changes in ... getIndexDefinition ... not from stored index definition

Yes, I know, this is a bug in the patch. I will fix that.

> the patch you had attached seems quite risky to me

Yes. I didn't plan to apply the patch, it's just the starting point. There are 
bugs, todos, and some parts are probably not needed.

Next, I will try to find out which parts are not needed.

> let index open happen as it happens today but copy required files right away 
> (synchronously) and schedule rest of the files for later.

I'm afraid I would need some help for this. I tried disabling copy-on-read, but 
then the file are opened from the datastore, which has some additional 
problems: files are opened multiple times. So I came to the conclusion it's 
best not to open the files until they are really needed to run queries, and 
needed to do detailed cost estimation (if the index might be used). So there 
are 3 stages (AFAIK):

* Stage 1: just the index definition is needed so see if the properties are 
indexed.
* Stage 2: numDocs are needed to do cost estimation.
* Stage 3: index is used for a query.

Obviously, for stage 3, the index files are needed. For stage 1, right now the 
index files are opened. I think it's sufficient to delay opening the files 
there, and just use the index definition. For stage 2, I think (not sure yet) 
that this is actually rare enough and it's OK to open all index files. If it 
turns out this is _not_ that rare, then we can store the numDocs in the index 
definition from time to time (in theory we could do that for every index 
update). Then store the time of the numDocs update. And when the numDocs are 
needed, then either they are read from the index definition (let's say if they 
are younger than 1 hour or so), or else open the index files.



> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-10 Thread Vikas Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714647#comment-16714647
 ] 

Vikas Saurabh commented on OAK-7947:


[~tmueller], the patch you had attached seems quite risky to me (as it touches 
quite a lot of places) and would solve "avoid opening index as long as possible 
wrt index definitions". If index definition can potentially answer the query 
and we want to open index to say get num docs or num docs per field then we 
would still copy in all index files. I've a at least one comment on the patch 
which I'd note at the end.
 
Maybe we could try a different approach - let index open happen as it happens 
today but copy required files right away (synchronously) and schedule rest of 
the files for later. Here's a snip of size sorted list of files from a 11G 
{{damAssetLucene}} index that [~chibulcu] had provided me from an AEM isntance:
{noformat}
$ ls -lsSh
total 11G
4.5G -rw-r--r-- 1 vsaurabh vsaurabh 4.5G Nov 23 12:20 _101z.fdt
4.5G -rw-r--r-- 1 vsaurabh vsaurabh 4.5G Nov 23 13:43 _1zt4.fdt
580M -rw-r--r-- 1 vsaurabh vsaurabh 580M Nov 23 12:20 _101z.pos
579M -rw-r--r-- 1 vsaurabh vsaurabh 579M Nov 23 13:43 _1zt4.pos
177M -rw-r--r-- 1 vsaurabh vsaurabh 177M Nov 23 13:44 _20z0.cfs
106M -rw-r--r-- 1 vsaurabh vsaurabh 106M Nov 23 12:20 _1x4o.cfs
 65M -rw-r--r-- 1 vsaurabh vsaurabh  65M Nov 23 13:44 _20bb.cfs
 29M -rw-r--r-- 1 vsaurabh vsaurabh  29M Nov 23 13:44 _217z.cfs
 16M -rw-r--r-- 1 vsaurabh vsaurabh  16M Nov 23 12:10 _101z.doc
 16M -rw-r--r-- 1 vsaurabh vsaurabh  16M Nov 23 12:20 _1zt4.doc
6.7M -rw-r--r-- 1 vsaurabh vsaurabh 6.7M Nov 23 13:44 _21ef.cfs
6.5M -rw-r--r-- 1 vsaurabh vsaurabh 6.5M Nov 23 13:44 _216f.cfs
6.3M -rw-r--r-- 1 vsaurabh vsaurabh 6.3M Nov 23 12:20 _101z.tim
5.9M -rw-r--r-- 1 vsaurabh vsaurabh 5.9M Nov 23 13:43 _1zt4.tim
5.9M -rw-r--r-- 1 vsaurabh vsaurabh 5.9M Nov 23 13:44 _21cy.cfs
4.4M -rw-r--r-- 1 vsaurabh vsaurabh 4.4M Nov 23 13:44 _21ab.cfs
3.8M -rw-r--r-- 1 vsaurabh vsaurabh 3.8M Nov 23 13:44 _21e4.cfs
3.7M -rw-r--r-- 1 vsaurabh vsaurabh 3.7M Nov 23 13:44 _21du.cfs
3.0M -rw-r--r-- 1 vsaurabh vsaurabh 3.0M Nov 23 13:44 _21dk.cfs
2.6M -rw-r--r-- 1 vsaurabh vsaurabh 2.6M Nov 23 13:44 _21f1.cfs
648K -rw-r--r-- 1 vsaurabh vsaurabh 647K Nov 23 12:10 _101z.dvd
424K -rw-r--r-- 1 vsaurabh vsaurabh 421K Nov 23 12:20 _1zt4.dvd
380K -rw-r--r-- 1 vsaurabh vsaurabh 378K Nov 23 12:20 _101z.fdx
372K -rw-r--r-- 1 vsaurabh vsaurabh 369K Nov 23 13:43 _1zt4.fdx
120K -rw-r--r-- 1 vsaurabh vsaurabh 120K Nov 23 13:44 _21f7.cfs
120K -rw-r--r-- 1 vsaurabh vsaurabh 120K Nov 23 13:44 _21f4.cfs


{noformat}

Looking at 
https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/codecs/lucene46/package-summary.html,
 {{fdt}} files are stored field data and {{pos}} is positional data for indexed 
terms. Both these shouldn't get loaded only for cost evaluation afaict (we 
should probably try to confirm this btw). These 2 form the biggest chunk of the 
files - so, maybe only avoiding these to get copied over just to open an index 
would save us a lot of time for first time index open. Additionally, I think 
this approach is much less risky imo.

_patch review_
The changes in
{noformat}
public LuceneIndexDefinition getIndexDefinition(String indexPath){
{noformat}
when index isn't in index map is providing a definition which is visible in 
tree and not from stored index definition that gets stored. This would change 
the behavior of planner to start to use un-indexed index definition as well.

Afaics, the other changes are essentially doing lazy init and won't affect 
behavior afaics - but it does make it a little brittle to control to avoid 
index open (an unrelated part of code might start call some part that would in 
turn happily open the index).

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-07 Thread Chetan Mehrotra (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712595#comment-16712595
 ] 

Chetan Mehrotra commented on OAK-7947:
--

[~tmueller] One reason for doing eager loading was to avoid contention in 
queries hitting at very start. To make is lazy what we can do is store the data 
points required for index planning in index data node itself in repository. So 
stuff like numDocs and field count etc can recorded in repo upon index close.

Then at least for index planning phase we need not open the IndexWriter at all

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OAK-7947) Lazy loading of Lucene index files startup

2018-12-06 Thread Thomas Mueller (JIRA)


[ 
https://issues.apache.org/jira/browse/OAK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711507#comment-16711507
 ] 

Thomas Mueller commented on OAK-7947:
-

The attached solves the issue. It contains various changes, possibly some of 
them are not needed, and some might be incorrect / problematic. This is 
work-in-progress. Still it would be nice to get some feedback from those who 
are more familiar with this code, for example [~catholicon] [~teofili] 
[~chetanm]. Changes I did:

* IndexTracker.getIndexDefinition constructs the node and returns it if the 
index isn't in the indices map yet. I don't know why it returned null before, 
it seems wrong to me.
* LuceneIndexNodeManager always opened the index, I don't know why. 
SearcherHolder now doesn't always do that. I basically make SearcherHolder open 
the index lazily.
* LucenePropertyIndex acquireIndexNode is called when planning, and that method 
opens the index files. I don't know why. I created a class LazyLuceneIndexNode 
that wraps LuceneIndexNode and creates it lazily.
* OakStreamingIndexFile now logs the directory name as well, not just the file 
name.
* DefaultIndexReader now opens the directory (DirectoryReader.open) lazily; 
only when calling getReader.
* FulltextIndexPlanner.estimatedEntryCount now only calls getNumDocs when 
really needed (that is, only if "entryCount" isn't set in the index 
definition). That should avoid having to open the index if we know the 
entryCount is high.

> Lazy loading of Lucene index files startup
> --
>
> Key: OAK-7947
> URL: https://issues.apache.org/jira/browse/OAK-7947
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-7947.patch
>
>
> Right now, all Lucene index binaries are loaded on startup (I think when the 
> first query is run, to do cost calculation). This is a performance problem if 
> the index files are large, and need to be downloaded from the data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)