Hello, It is a DocumentNodeStore based instance. We don't extract data from binary files, just indexing metadata stored on nodes.
Regards. On Wed, Jun 7, 2017 at 7:04 AM, Chetan Mehrotra <[email protected]> wrote: > > I'm not sure how to minimize the impact of performing a re-index (or new > > index creation), that will take 48 hours (using oak 1.4). I mean, I don't > > want to block other indexes update. > > Is this a SegmentNodeStore based setup or DocumentNodeStore based? > > The reindexing log would have some stats around time spent in indexing > and time spent in text extraction. Can you check whats the part which > takes most time. If its text extraction then you can reduce the time > spent in that via using Pre-Extraction support [1]. This allow > extracting text before hand and then using that at time of actual > indexing > > Changing the "indexing lane" should help but is tricky to get right > and something we are improving currently OAK-6246 and OAK-5553 > > > indexes won't be updated. On the other hand, it seems that using the > > *reindex-async* flag (see OAK-1456 > > <https://issues.apache.org/jira/browse/OAK-1456>) could do the trick. I > > This mode is useful for property index as in the end it removes the > async flag and makes the index synchronous which would cause issues > for lucene based index > > Chetan Mehrotra > [1] https://jackrabbit.apache.org/oak/docs/query/lucene.html# > text-extraction > > > On Tue, Jun 6, 2017 at 9:02 PM, Alvaro Cabrerizo <[email protected]> > wrote: > > Hello, > > > > I'm not sure how to minimize the impact of performing a re-index (or new > > index creation), that will take 48 hours (using oak 1.4). I mean, I don't > > want to block other indexes update. > > > > First, we have set the value of async as *fulltext-async* for the new > > index. I guess, that at least, all the indexes managed by the *async* > lane > > <http://jackrabbit.apache.org/oak/docs/query/indexing.html#indexing-lane > > > > will not be affected (please, confirm if I'm right). Then we try to > > minimize the impact on the fulltext-async lane. According to OAK-5553 > > <https://issues.apache.org/jira/browse/OAK-5553> there isn't much we > can do > > while the indexing process is active for the new index, as the rest of > > indexes won't be updated. On the other hand, it seems that using the > > *reindex-async* flag (see OAK-1456 > > <https://issues.apache.org/jira/browse/OAK-1456>) could do the trick. I > > mean, setting reindex-async=true to the new index will allow other > indexes > > (in the same lane) being updated while it is being populated? If that is > > true, we could create the index with that flag and then remove it. > > > > Regards. >
