[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785596#comment-16785596 ] Andrei Dulceanu commented on OAK-8063: -- Backported to 1.10 at r1854920. > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.11.0, 1.8.12, 1.10.2 > > Attachments: OAK-8063-02.patch, OAK-8063-03.patch, OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784424#comment-16784424 ] Andrei Dulceanu commented on OAK-8063: -- Backported to 1.8 at r1854850. > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.11.0, 1.8.12, 1.10.2 > > Attachments: OAK-8063-02.patch, OAK-8063-03.patch, OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783423#comment-16783423 ] Andrei Dulceanu commented on OAK-8063: -- [~frm], thanks for reviewing! I fixed this in trunk using [^OAK-8063-03.patch] at r1854772. > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.11.0, 1.8.12, 1.10.2 > > Attachments: OAK-8063-02.patch, OAK-8063-03.patch, OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783356#comment-16783356 ] Francesco Mari commented on OAK-8063: - [~dulceanu], I like your patch very much since it greatly simplifies the transfer logic. I propose a third version of the patch with a minor change just to avoid repeating the same check in two different places and to consistently log the root segment ID where the traversal starts. > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.11.0, 1.8.12, 1.10.2 > > Attachments: OAK-8063-02.patch, OAK-8063-03.patch, OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780439#comment-16780439 ] Andrei Dulceanu commented on OAK-8063: -- In my previous patch I somehow misinterpreted the edge direction between segments and their references. Therefore, by using a stack for storing the final transfer order, I actually achieved the opposite: a segment transferred before all its references. I came back to this and replaced the stack with a linked list to fix this behaviour. Moreover, I added an explanation and the same logging we had in the original code. [~frm], could you please take a look at the second version of the patch? > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.8.12, 1.10.2 > > Attachments: OAK-8063-02.patch, OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775268#comment-16775268 ] Michael Dürig commented on OAK-8063: {quote}BTW, could really be the case that the segment graph to be transferred from primary contains a cycle? {quote} Unfortunately yes. Fortunately those loops should be rather small. Loop can happen when writing in parallel through multiple {{SegmentBufferWriter}} instances. > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.10.1, 1.8.12 > > Attachments: OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OAK-8063) The cold standby client doesn't correctly handle backward references
[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775105#comment-16775105 ] Andrei Dulceanu commented on OAK-8063: -- [~frm], in [^OAK-8063.patch] I propose a new, simplified implementation for {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}}. It follows the classical topological order algorithm with a twist: by eagerly marking a node visited, it breaks possible loops if the graph contains a cycle. BTW, could really be the case that the segment graph to be transferred from primary contains a cycle? My understanding is that segments only refer previous written segments, without any forward references, thus avoiding cycles. I also kept the optimisation about segments already transferred (i.e. local), but removed the one for the "diamond problem" which could not be possible in the current implementation. Could you please take a look at the patch? > The cold standby client doesn't correctly handle backward references > > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby >Affects Versions: 1.6.0 >Reporter: Andrei Dulceanu >Assignee: Andrei Dulceanu >Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.10.1, 1.8.12 > > Attachments: OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)