[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727860#comment-15727860 ] Chetan Mehrotra commented on OAK-4043: -- Opened OAK-5230 to track this > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu >Priority: Critical > Fix For: 1.6, 1.4.8, 1.5.11 > > Attachments: OAK-4043.patch > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes (OAK-2749), which we already did in AEM, but the checkpoint > tool is blissfully unaware of this and it might trigger a full reindex > following offline compaction. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502913#comment-15502913 ] Alex Parvulescu commented on OAK-4043: -- bq. Alex Parvulescu In addition to this we should add a check in Oak#withAsyncIndexing(java.lang.String, long) to enforce this convention around checkpoint name i.e. name ending with async I think that's a different concern and needs to be tracked in a dedicated issue > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu >Priority: Critical > Labels: candidate_oak_1_4 > Fix For: 1.6, 1.5.11 > > Attachments: OAK-4043.patch > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes (OAK-2749), which we already did in AEM, but the checkpoint > tool is blissfully unaware of this and it might trigger a full reindex > following offline compaction. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502908#comment-15502908 ] Alex Parvulescu commented on OAK-4043: -- bq. The method in the CheckpointsHelper could be simplified using Utils.min(Revision, Revision) thanks for the suggestion! did not know about this method :) > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu >Priority: Critical > Labels: candidate_oak_1_4 > Fix For: 1.6, 1.5.11 > > Attachments: OAK-4043.patch > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes (OAK-2749), which we already did in AEM, but the checkpoint > tool is blissfully unaware of this and it might trigger a full reindex > following offline compaction. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502757#comment-15502757 ] Marcel Reutegger commented on OAK-4043: --- Patch looks good to me. The method in the CheckpointsHelper could be simplified using Utils.min(Revision, Revision), but the method was only introduced recently (1.4). That is, the current patch is easier to backport to 1.2 and earlier if necessary. > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, run >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu >Priority: Critical > Labels: candidate_oak_1_4 > Fix For: 1.6, 1.5.11 > > Attachments: OAK-4043.patch > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes, which we already did in AEM, but the checkpoint tool is > blissfully unaware of this and it might trigger a full reindex following > offline compaction. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502478#comment-15502478 ] Chetan Mehrotra commented on OAK-4043: -- Patch look fine. Without this patch {{rm-unreferenced}} {noformat} $java -jar oak-run-1.6-SNAPSHOT.jar checkpoints --segment=true /path/to/segmentstore rm-unreferenced Apache Jackrabbit Oak 1.6-SNAPSHOT Checkpoints /path/to/segmentstore Referenced checkpoint from /:async@async is bcda394b-9a01-45e2-a326-bab1618ac60e Removed 1 checkpoints in 112ms. {noformat} With this fix {noformat} $ java -jar oak-run-1.6-SNAPSHOT.jar checkpoints --segment=true /path/to/segmentstore rm-unreferenced Apache Jackrabbit Oak 1.6-SNAPSHOT Checkpoints /path/to/segmentstore Referenced checkpoint from /:async@async is bcda394b-9a01-45e2-a326-bab1618ac60e Referenced checkpoint from /:async@fulltext-async is 46dce134-fe32-4929-84bf-abbd0ed46d9d Removed 0 checkpoints in 65ms. {noformat} [~mreutegg] Just to be double safe here can you also review the changes around checkpoint handling for DocumentNodeStore [~alex.parvulescu] In addition to this we should add a check in {{Oak#withAsyncIndexing(java.lang.String, long)}} to enforce this convention around checkpoint name i.e. name ending with {{async}} > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, run >Reporter: Alex Parvulescu >Assignee: Alex Parvulescu >Priority: Critical > Labels: candidate_oak_1_4 > Fix For: 1.6, 1.5.11 > > Attachments: OAK-4043.patch > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes, which we already did in AEM, but the checkpoint tool is > blissfully unaware of this and it might trigger a full reindex following > offline compaction. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495519#comment-15495519 ] Chetan Mehrotra commented on OAK-4043: -- I would prefer to avoid any change in existing names. Probably we can follow the convention of such names having prefix of {{async}} (possibly add a check in Oak also to enforce that) and then figure out the list of checkpoints based on that. Or you can possibly consider any string property in {{:async}} to be a possible checkpoint to determine match! > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core, run >Reporter: Alex Parvulescu >Assignee: Davide Giannella >Priority: Critical > Labels: candidate_oak_1_4 > Fix For: 1.6 > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes, which we already did in AEM, but the checkpoint tool is > blissfully unaware of this and it might trigger a full reindex following > offline compaction. > This needs fixing before the big 1.4 release, so I'm marking it as a blocker. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230039#comment-15230039 ] Davide Giannella commented on OAK-4043: --- [~alex.parvulescu] I was eventually looking into this. I have some doubts though. tell me what you think and if I'm wrong with anything. The trick would be to return, for example, as {{Set}} from the [Checkpoints.getReferenceCheckpoint()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/checkpoint/Checkpoints.java#L213] and then use {{Set.contains()}} (or any other similar approach) when looking up whether the current analysed checkpoint is referenced or not ([Checkpoints.removeUnreferenced()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/checkpoint/Checkpoints.java#L129]). Unfortunately the name of the property for the configured lanes is free. So you could put in anything and we don't have traces, as far as I saw, on what are the configured {{name}}s. Those {{name}}s should than be used instead of the hardcoded {{async}} in the [getReferencedCheckpoints()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/checkpoint/Checkpoints.java#L215]. A possible solution would be to namespace the property that holds the checkpoint reference under {{/:async}}. So we would end up with something like: {{/:async@namespace-lane1, /:async@namespace-lane2, ...}}. In this way we would be able to loop through all the namespaced properties. I have serious concerns on previous deployments though. Wouldn't this trigger a full reindex if we go in this direction? We could mitigate this by adding a sort-of "upgrade logic" when retrieving the new checkpoint reference. Something like the following in AsyncIndexUpdate. Should try to generalise the code in the {{else}} rather than copy-pasting; but it's for showing the idea. {noformat} diff --git a/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java b/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java index b6eac68..b40e571 100644 --- a/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java +++ b/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java @@ -102,6 +102,8 @@ public class AsyncIndexUpdate implements Runnable, Closeable { * taking over a running job */ private static final long DEFAULT_ASYNC_TIMEOUT; + +private static final String CP_PROP_NAMESPACE = "cpreference-"; static { int value = 15; @@ -387,7 +389,7 @@ public class AsyncIndexUpdate implements Runnable, Closeable { // find the last indexed state, and check if there are recent changes NodeState before; -String beforeCheckpoint = async.getString(name); +String beforeCheckpoint = async.getString(CP_PROP_NAMESPACE + name); if (beforeCheckpoint != null) { NodeState state = store.retrieve(beforeCheckpoint); if (state == null) { @@ -406,8 +408,28 @@ public class AsyncIndexUpdate implements Runnable, Closeable { before = state; } } else { -log.info("[{}] Initial index update", name); -before = MISSING_NODE; +beforeCheckpoint = async.getString(name); +if (beforeCheckpoint == null) { +log.info("[{}] Initial index update", name); +before = MISSING_NODE; +} else { +NodeState state = store.retrieve(beforeCheckpoint); +if (state == null) { +log.warn( +"[{}] Failed to retrieve previously indexed checkpoint {}; re-running the initial index update", +name, beforeCheckpoint); +beforeCheckpoint = null; +before = MISSING_NODE; +} else if (noVisibleChanges(state, root)) { +log.debug( +"[{}] No changes since last checkpoint; skipping the index update", +name); +postAsyncRunStatsStatus(indexStats); +return; +} else { +before = state; +} +} } // there are some recent changes, so let's create a new checkpoint @@ -511,7 +533,7 @@ public class AsyncIndexUpdate implements Runnable, Closeable { throw exception; } -builder.child(ASYNC).setProperty(name, afterCheckpoint); +builder.child(ASYNC).setProperty(CP_PROP_NAMESPACE + name, afterCheckpoint); builder.child(ASYNC).setProperty(PropertyStates.createProperty(lastIndexedTo, a
[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes
[ https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162718#comment-15162718 ] Davide Giannella commented on OAK-4043: --- 1.3.17 will be the last unstable cut that will then be the same as 1.4.0 and branch. Added such revision. > Oak run checkpoints needs to account for multiple index lanes > - > > Key: OAK-4043 > URL: https://issues.apache.org/jira/browse/OAK-4043 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core, run >Reporter: Alex Parvulescu >Priority: Blocker > Fix For: 1.4, 1.3.17 > > > Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a > single checkpoint reference (the default one). Now is it possible to add > multiple lanes, which we already did in AEM, but the checkpoint tool is > blissfully unaware of this and it might trigger a full reindex following > offline compaction. > This needs fixing before the big 1.4 release, so I'm marking it as a blocker. > fyi [~edivad], [~chetanm] > [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints -- This message was sent by Atlassian JIRA (v6.3.4#6332)