[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-12-06 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727860#comment-15727860
 ] 

Chetan Mehrotra commented on OAK-4043:
--

Opened OAK-5230 to track this

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Critical
> Fix For: 1.6, 1.4.8, 1.5.11
>
> Attachments: OAK-4043.patch
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes (OAK-2749), which we already did in AEM, but the checkpoint 
> tool is blissfully unaware of this and it might trigger a full reindex 
> following offline compaction.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-19 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502913#comment-15502913
 ] 

Alex Parvulescu commented on OAK-4043:
--

bq. Alex Parvulescu In addition to this we should add a check in 
Oak#withAsyncIndexing(java.lang.String, long) to enforce this convention around 
checkpoint name i.e. name ending with async
I think that's a different concern and needs to be tracked in a dedicated issue

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4043.patch
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes (OAK-2749), which we already did in AEM, but the checkpoint 
> tool is blissfully unaware of this and it might trigger a full reindex 
> following offline compaction.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-19 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502908#comment-15502908
 ] 

Alex Parvulescu commented on OAK-4043:
--

bq. The method in the CheckpointsHelper could be simplified using 
Utils.min(Revision, Revision)
thanks for the suggestion! did not know about this method :)

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: run
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4043.patch
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes (OAK-2749), which we already did in AEM, but the checkpoint 
> tool is blissfully unaware of this and it might trigger a full reindex 
> following offline compaction.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-19 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502757#comment-15502757
 ] 

Marcel Reutegger commented on OAK-4043:
---

Patch looks good to me. The method in the CheckpointsHelper could be simplified 
using Utils.min(Revision, Revision), but the method was only introduced 
recently (1.4). That is, the current patch is easier to backport to 1.2 and 
earlier if necessary.

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4043.patch
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-19 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502478#comment-15502478
 ] 

Chetan Mehrotra commented on OAK-4043:
--

Patch look fine. Without this patch {{rm-unreferenced}}

{noformat}
$java -jar oak-run-1.6-SNAPSHOT.jar checkpoints --segment=true  
/path/to/segmentstore rm-unreferenced
Apache Jackrabbit Oak 1.6-SNAPSHOT
Checkpoints /path/to/segmentstore
Referenced checkpoint from /:async@async is bcda394b-9a01-45e2-a326-bab1618ac60e
Removed 1 checkpoints in 112ms.
{noformat}

With this fix
{noformat}
$ java -jar oak-run-1.6-SNAPSHOT.jar checkpoints --segment=true  
/path/to/segmentstore rm-unreferenced
Apache Jackrabbit Oak 1.6-SNAPSHOT
Checkpoints /path/to/segmentstore
Referenced checkpoint from /:async@async is bcda394b-9a01-45e2-a326-bab1618ac60e
Referenced checkpoint from /:async@fulltext-async is 
46dce134-fe32-4929-84bf-abbd0ed46d9d
Removed 0 checkpoints in 65ms.
{noformat}

[~mreutegg] Just to be double safe here can you also review the changes around 
checkpoint handling for DocumentNodeStore

[~alex.parvulescu] In addition to this we should add a check in 
{{Oak#withAsyncIndexing(java.lang.String, long)}} to enforce this convention 
around checkpoint name i.e. name ending with {{async}}

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4043.patch
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-09-16 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495519#comment-15495519
 ] 

Chetan Mehrotra commented on OAK-4043:
--

I would prefer to avoid any change in existing names. Probably we can follow 
the convention of such names having prefix of {{async}} (possibly add a check 
in Oak also to enforce that) and then figure out the list of checkpoints based 
on that. 

Or you can possibly consider any string property in {{:async}} to be a possible 
checkpoint to determine match!



> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Assignee: Davide Giannella
>Priority: Critical
>  Labels: candidate_oak_1_4
> Fix For: 1.6
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-04-07 Thread Davide Giannella (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230039#comment-15230039
 ] 

Davide Giannella commented on OAK-4043:
---

[~alex.parvulescu]

I was eventually looking into this. I have some doubts though. tell me
what you think and if I'm wrong with anything.

The trick would be to return, for example, as {{Set}} from the
[Checkpoints.getReferenceCheckpoint()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/checkpoint/Checkpoints.java#L213]
and then use {{Set.contains()}} (or any other similar approach) when
looking up whether the current analysed checkpoint is referenced or
not
([Checkpoints.removeUnreferenced()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/checkpoint/Checkpoints.java#L129]).

Unfortunately the name of the property for the configured lanes is
free. So you could put in anything and we don't have traces, as far as
I saw, on what are the configured {{name}}s. Those {{name}}s should
than be used instead of the hardcoded {{async}} in the
[getReferencedCheckpoints()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-run/src/main/java/org/apache/jackrabbit/oak/checkpoint/Checkpoints.java#L215].

A possible solution would be to namespace the property that holds the
checkpoint reference under {{/:async}}. So we would end up with
something like: {{/:async@namespace-lane1, /:async@namespace-lane2, ...}}.

In this way we would be able to loop through all the namespaced
properties.

I have serious concerns on previous deployments though. Wouldn't this
trigger a full reindex if we go in this direction? We could mitigate
this by adding a sort-of "upgrade logic" when retrieving the new
checkpoint reference. Something like the following in
AsyncIndexUpdate. Should try to generalise the code in the {{else}}
rather than copy-pasting; but it's for showing the idea.

{noformat}
diff --git 
a/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java
 
b/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java
index b6eac68..b40e571 100644
--- 
a/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java
+++ 
b/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java
@@ -102,6 +102,8 @@ public class AsyncIndexUpdate implements Runnable, 
Closeable {
  * taking over a running job
  */
 private static final long DEFAULT_ASYNC_TIMEOUT;
+
+private static final String CP_PROP_NAMESPACE = "cpreference-";
 
 static {
 int value = 15;
@@ -387,7 +389,7 @@ public class AsyncIndexUpdate implements Runnable, 
Closeable {
 
 // find the last indexed state, and check if there are recent changes
 NodeState before;
-String beforeCheckpoint = async.getString(name);
+String beforeCheckpoint = async.getString(CP_PROP_NAMESPACE + name);
 if (beforeCheckpoint != null) {
 NodeState state = store.retrieve(beforeCheckpoint);
 if (state == null) {
@@ -406,8 +408,28 @@ public class AsyncIndexUpdate implements Runnable, 
Closeable {
 before = state;
 }
 } else {
-log.info("[{}] Initial index update", name);
-before = MISSING_NODE;
+beforeCheckpoint = async.getString(name);
+if (beforeCheckpoint == null) {
+log.info("[{}] Initial index update", name);
+before = MISSING_NODE;
+} else {
+NodeState state = store.retrieve(beforeCheckpoint);
+if (state == null) {
+log.warn(
+"[{}] Failed to retrieve previously indexed 
checkpoint {}; re-running the initial index update",
+name, beforeCheckpoint);
+beforeCheckpoint = null;
+before = MISSING_NODE;
+} else if (noVisibleChanges(state, root)) {
+log.debug(
+"[{}] No changes since last checkpoint; skipping 
the index update",
+name);
+postAsyncRunStatsStatus(indexStats);
+return;
+} else {
+before = state;
+}
+}
 }
 
 // there are some recent changes, so let's create a new checkpoint
@@ -511,7 +533,7 @@ public class AsyncIndexUpdate implements Runnable, 
Closeable {
 throw exception;
 }
 
-builder.child(ASYNC).setProperty(name, afterCheckpoint);
+builder.child(ASYNC).setProperty(CP_PROP_NAMESPACE + name, 
afterCheckpoint);
 
builder.child(ASYNC).setProperty(PropertyStates.createProperty(lastIndexedTo, 
afterTime, 

[jira] [Commented] (OAK-4043) Oak run checkpoints needs to account for multiple index lanes

2016-02-24 Thread Davide Giannella (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162718#comment-15162718
 ] 

Davide Giannella commented on OAK-4043:
---

1.3.17 will be the last unstable cut that will then be the same as 1.4.0 and 
branch. Added such revision.

> Oak run checkpoints needs to account for multiple index lanes
> -
>
> Key: OAK-4043
> URL: https://issues.apache.org/jira/browse/OAK-4043
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, run
>Reporter: Alex Parvulescu
>Priority: Blocker
> Fix For: 1.4, 1.3.17
>
>
> Oak run {{checkpoints rm-unreferenced}} [0] currently is hardcoded on a 
> single checkpoint reference (the default one). Now is it possible to add 
> multiple lanes, which we already did in AEM, but the checkpoint tool is 
> blissfully unaware of this and it might trigger a full reindex following 
> offline compaction.
> This needs fixing before the big 1.4 release, so I'm marking it as a blocker.
> fyi [~edivad], [~chetanm]
> [0] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#checkpoints



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)