subject:"\[jira\] \[Updated\] \(HDDS\-4336\) ContainerInfo does not persist BCSID leading to failed replicas reports"

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

2020-10-13 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-4336:

Fix Version/s: 1.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> ContainerInfo does not persist BCSID leading to failed replicas reports
> ---
>
> Key: HDDS-4336
> URL: https://issues.apache.org/jira/browse/HDDS-4336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 1.1.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> If you create a container, and then close it, the BCSID is synced on the 
> datanodes and then the value is updated in SCM via setting the "sequenceID" 
> field on the containerInfo object for the container.
> If you later restart just SCM, the sequenceID becomes zero, and then 
> container reports for the replica fail with a stack trace like:
> {code}
> Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
>   at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
>   at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
>   at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The assertion here is failing, as it does not allow for the sequenceID to be 
> changed on a CLOSED container:
> {code}
>   public void updateSequenceId(long sequenceID) {
> assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
> sequenceId = max(sequenceID, sequenceId);
>   }
> {code}
> The issue seems to be caused by the serialisation and deserialisation of the 
> containerInfo object to protobuf, as sequenceId never persisted or restored.
> However, I am also confused about how this ever worked, as this is a pretty 
> significant problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

2020-10-12 Thread Nanda kumar (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-4336:
--
Status: Patch Available  (was: Open)

> ContainerInfo does not persist BCSID leading to failed replicas reports
> ---
>
> Key: HDDS-4336
> URL: https://issues.apache.org/jira/browse/HDDS-4336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 1.1.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> If you create a container, and then close it, the BCSID is synced on the 
> datanodes and then the value is updated in SCM via setting the "sequenceID" 
> field on the containerInfo object for the container.
> If you later restart just SCM, the sequenceID becomes zero, and then 
> container reports for the replica fail with a stack trace like:
> {code}
> Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
>   at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
>   at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
>   at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The assertion here is failing, as it does not allow for the sequenceID to be 
> changed on a CLOSED container:
> {code}
>   public void updateSequenceId(long sequenceID) {
> assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
> sequenceId = max(sequenceID, sequenceId);
>   }
> {code}
> The issue seems to be caused by the serialisation and deserialisation of the 
> containerInfo object to protobuf, as sequenceId never persisted or restored.
> However, I am also confused about how this ever worked, as this is a pretty 
> significant problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

2020-10-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-4336:
-
Labels: pull-request-available  (was: )

> ContainerInfo does not persist BCSID leading to failed replicas reports
> ---
>
> Key: HDDS-4336
> URL: https://issues.apache.org/jira/browse/HDDS-4336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 1.1.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> If you create a container, and then close it, the BCSID is synced on the 
> datanodes and then the value is updated in SCM via setting the "sequenceID" 
> field on the containerInfo object for the container.
> If you later restart just SCM, the sequenceID becomes zero, and then 
> container reports for the replica fail with a stack trace like:
> {code}
> Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
>   at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
>   at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
>   at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
>   at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The assertion here is failing, as it does not allow for the sequenceID to be 
> changed on a CLOSED container:
> {code}
>   public void updateSequenceId(long sequenceID) {
> assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
> sequenceId = max(sequenceID, sequenceId);
>   }
> {code}
> The issue seems to be caused by the serialisation and deserialisation of the 
> containerInfo object to protobuf, as sequenceId never persisted or restored.
> However, I am also confused about how this ever worked, as this is a pretty 
> significant problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

2020-10-12 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-4336:

Description: 
If you create a container, and then close it, the BCSID is synced on the 
datanodes and then the value is updated in SCM via setting the "sequenceID" 
field on the containerInfo object for the container.

If you later restart just SCM, the sequenceID becomes zero, and then container 
reports for the replica fail with a stack trace like:

{code}
Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
java.lang.AssertionError
at 
org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The assertion here is failing, as it does not allow for the sequenceID to be 
changed on a CLOSED container:

{code}
  public void updateSequenceId(long sequenceID) {
assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
sequenceId = max(sequenceID, sequenceId);
  }
{code}

The issue seems to be caused by the serialisation and deserialisation of the 
containerInfo object to protobuf, as sequenceId never persisted or restored.

However, I am also confused about how this ever worked, as this is a pretty 
significant problem.



  was:
If you create a container, and then close it, the BCSID is synced on the 
datanodes and then the value is updated in SCM via setting the "sequenceID" 
field on the containerInfo object for the container.

If you later restart just SCM, the sequenceID becomes zero, and then container 
reports for the replica fail with a stack trace like:

{code}
Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
java.lang.AssertionError
at 
org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The assertion here is what is failing, as it does not allow for the sequenceID 
to be changed on a CLOSED container:

{code}
  public void updateSequenceId(long sequenceID) {
assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
sequenceId = max(sequenceID, sequenceId);
  }
{code}

The issue seems to be caused by the serialisation and deserialisation of the 
containerInfo object to protobuf, as sequenceId never persisted or restored.

However, I am also confused about how this ever worked, as this is a pretty 
significant problem.




> ContainerInfo does not persist BCSID leading to failed replicas reports
> ---
>
> Key: HDDS-4336
> URL: https://issues.apache.org/jira/browse/HDDS-4336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 1.1.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> If you create a container, and the

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

2020-10-12 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-4336:

Description: 
If you create a container, and then close it, the BCSID is synced on the 
datanodes and then the value is updated in SCM via setting the "sequenceID" 
field on the containerInfo object for the container.

If you later restart just SCM, the sequenceID becomes zero, and then container 
reports for the replica fail with a stack trace like:

{code}
Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
java.lang.AssertionError
at 
org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The assertion here is what is failing, as it does not allow for the sequenceID 
to be changed on a CLOSED container:

{code}
  public void updateSequenceId(long sequenceID) {
assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
sequenceId = max(sequenceID, sequenceId);
  }
{code}

The issue seems to be caused by the serialisation and deserialisation of the 
containerInfo object to protobuf, as sequenceId never persisted or restored.

However, I am also confused about how this ever worked, as this is a pretty 
significant problem.



  was:
If you create a container, and then close it, the BCSID is synced on the 
datanodes and then the value is updated in SCM via setting the "sequenceID" 
field on the containerInfo object for the container.

If you later restart just SCM, the sequenceID becomes null, and then container 
reports for the replica fail with a stack trace like:

{code}
Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
java.lang.AssertionError
at 
org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The assertion here is what is failing, as it does not allow for the sequenceID 
to be changed on a CLOSED container:

{code}
  public void updateSequenceId(long sequenceID) {
assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
sequenceId = max(sequenceID, sequenceId);
  }
{code}

The issue seems to be caused by the serialisation and deserialisation of the 
containerInfo object to protobuf, as sequenceId never persisted or restored.

However, I am also confused about how this ever worked, as this is a pretty 
significant problem.




> ContainerInfo does not persist BCSID leading to failed replicas reports
> ---
>
> Key: HDDS-4336
> URL: https://issues.apache.org/jira/browse/HDDS-4336
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 1.1.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> If you create a container,

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

[jira] [Updated] (HDDS-4336) ContainerInfo does not persist BCSID leading to failed replicas reports

5 matches

Site Navigation

Mail list logo

Footer information