[jira] [Commented] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649823#comment-16649823 ] Shashikant Banerjee commented on HDDS-629: -- Thanks [~jnp], for the review. Patch v6 addresses the review comments as well the one checkstyle issue reported. Test failures are not related to the patch. > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, > HDDS-629.005.patch, HDDS-629.006.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-629: - Attachment: HDDS-629.006.patch > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, > HDDS-629.005.patch, HDDS-629.006.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-629: - Attachment: HDDS-629.005.patch > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, HDDS-629.005.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649719#comment-16649719 ] Shashikant Banerjee commented on HDDS-629: -- Thanks [~jnp], for the review. Patch v5 addresses the review comments. The test failures reported here are not related to the patch. > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, HDDS-629.005.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-629: - Attachment: (was: HDDS-629.006.patch) > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, HDDS-629.005.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-629: - Attachment: HDDS-629.006.patch > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, HDDS-629.005.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-629: - Status: Open (was: Patch Available) > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, > HDDS-629.005.patch, HDDS-629.006.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-629: - Status: Patch Available (was: Open) > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, > HDDS-629.005.patch, HDDS-629.006.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-667) Fix TestOzoneFileInterfaces
[ https://issues.apache.org/jira/browse/HDDS-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651488#comment-16651488 ] Shashikant Banerjee commented on HDDS-667: -- Thanks [~msingh], for the patch. The patch looks good to me. +1 > Fix TestOzoneFileInterfaces > --- > > Key: HDDS-667 > URL: https://issues.apache.org/jira/browse/HDDS-667 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDDS-667.001.patch > > > The test is failing with the following exception. > This test is failing after e13a38f4bc358666e64687636cf7b025bce83b46 (HDDS-629) > {code} > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hadoop.fs.ozone.TestOzoneFileInterfaces > [ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 54.718 s <<< FAILURE! - in org.apache.hadoop.fs.ozone.TestOzoneFileInterfaces > [ERROR] > testOzFsReadWrite[1](org.apache.hadoop.fs.ozone.TestOzoneFileInterfaces) > Time elapsed: 7.1 s <<< ERROR! > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > Unable to find the block. > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:429) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:103) > at > org.apache.hadoop.ozone.client.io.ChunkGroupInputStream.getFromOmKeyInfo(ChunkGroupInputStream.java:290) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.getKey(RpcClient.java:493) > at > org.apache.hadoop.ozone.client.OzoneBucket.readKey(OzoneBucket.java:272) > at > org.apache.hadoop.fs.ozone.OzoneFileSystem.open(OzoneFileSystem.java:173) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899) > at > org.apache.hadoop.fs.ozone.TestOzoneFileInterfaces.testOzFsReadWrite(TestOzoneFileInterfaces.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runners.Suite.runChild(Suite.java:127) > at org.junit.runners.Suite.runChild(Suite.java:26) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at >
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656483#comment-16656483 ] Shashikant Banerjee commented on HDDS-676: -- Patch v4 addresses the javaodc, checkstyle and failed test cases. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.004.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.003.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656286#comment-16656286 ] Shashikant Banerjee commented on HDDS-676: -- cleaned up ReadSmallFile command handling related changes in Patch v3. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656270#comment-16656270 ] Shashikant Banerjee commented on HDDS-676: -- Thanks [~jnp], for the review comments. {code:java} What is the reason for changing clientCache key from PipelineId to string? {code} By default, SCM always gives a Ratis pipeline for open containers and when a XceiverClient instance gets created, it always placed in the clinet cache based on the pipeline ID. Since, while doing a Read Op we always want to use a Standalone pipeline with the same pipelineId which SCM provides, the idea is get a XceiverClientGrpc instance with the same pipleineId, thus having the same set of datanodes which will be used by Ratis pipeline as well having the same pipleineID, Changing key in ClinetCahe from pipelineId to a String which is combination of pipelineId and type gives us the flexiblity to have create two different types of pipeline with the same pipelineID. {code:java} The changes in XceiverClientRatis are only for testing? {code} Yes, for now. {code:java} It is minor but I feel we should not make type mutable in the Pipeline class. We could clone the Pipeline object to change the type. {code} This will be addressed with HDDS-694. {code:java} In ContainerStateMachine changes don't look related to this Jira as they are about put-small-files. If yes, we should put them in a separate jira. {code} Opened HDDS-697 for the same. Rest of the review comments are addressed in the patch along with the checkstyle fixes. Javadoc issues seem to be unrelated. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-697) update the BCSID for PutSmallFile command
Shashikant Banerjee created HDDS-697: Summary: update the BCSID for PutSmallFile command Key: HDDS-697 URL: https://issues.apache.org/jira/browse/HDDS-697 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.002.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-705) OS3Exception resource name should be the actual resource name
[ https://issues.apache.org/jira/browse/HDDS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658749#comment-16658749 ] Shashikant Banerjee edited comment on HDDS-705 at 10/22/18 8:52 AM: I think the patch which is committed to trunk has changes from HDDS-676 as well. This commit needs to get reverted I guess. was (Author: shashikant): I think the patch which is committed to trunk has changes from HDDS-676. This commit needs to get reverted I guess. > OS3Exception resource name should be the actual resource name > - > > Key: HDDS-705 > URL: https://issues.apache.org/jira/browse/HDDS-705 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDDS-705.00.patch > > > [https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html] > {code:java} > > > NoSuchKey > The resource you requested does not exist > /mybucket/myfoto.jpg > 4442587FB7D0A2F9 > {code} > > Right now in the code we are print resource as "bucket" , "key" instead of > actual resource name. > > Documentation shows key name with bucket, but actually when tried on AWS S3 > endpoint it shows just key name, found this information when using mitmproxy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-705) OS3Exception resource name should be the actual resource name
[ https://issues.apache.org/jira/browse/HDDS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658749#comment-16658749 ] Shashikant Banerjee commented on HDDS-705: -- I think the patch which is committed to trunk has changes from HDDS-676. This commit needs to get reverted I guess. > OS3Exception resource name should be the actual resource name > - > > Key: HDDS-705 > URL: https://issues.apache.org/jira/browse/HDDS-705 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HDDS-705.00.patch > > > [https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html] > {code:java} > > > NoSuchKey > The resource you requested does not exist > /mybucket/myfoto.jpg > 4442587FB7D0A2F9 > {code} > > Right now in the code we are print resource as "bucket" , "key" instead of > actual resource name. > > Documentation shows key name with bucket, but actually when tried on AWS S3 > endpoint it shows just key name, found this information when using mitmproxy. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1665#comment-1665 ] Shashikant Banerjee commented on HDDS-697: -- Patch v1 depends on HDDS-708. Not submitting it for now. > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch > > > Similar , to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Description: Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to update the BCSID in the container db on datanode. getSmallFile should validate the bcsId while reading the block similar to getBlock. (was: Similar , to putBlock/GetBlock, putSmallFile transaction in Ratis needs to update the BCSID in the container db on datanode. getSmallFile should validate the bcsId while reading the block similar to getBlock.) > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659126#comment-16659126 ] Shashikant Banerjee commented on HDDS-708: -- The test failures are not related to the patch. > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-708.000.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676-ozone-0.3.000.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-676-ozone-0.3.000.patch, HDDS-676.001.patch, > HDDS-676.002.patch, HDDS-676.003.patch, HDDS-676.004.patch, > HDDS-676.005.patch, HDDS-676.006.patch, HDDS-676.007.patch, HDDS-676.008.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
Shashikant Banerjee created HDDS-708: Summary: Validate BCSID while reading blocks from containers in datanodes Key: HDDS-708 URL: https://issues.apache.org/jira/browse/HDDS-708 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Ozone client while making a getBlock call during reading data , should read the bcsId from OzoneManager for the block and the same needs to be validated in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-708: - Attachment: HDDS-708.000.patch > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-708.000.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658831#comment-16658831 ] Shashikant Banerjee commented on HDDS-676: -- Thanks [~anu], for the review comments. I have created a new Jira HDDS-708 which will have the changes required for reading and validating BCSID while reading the block from container db on datanodes. Patch v7 attached has the changes required to enable read from open containers by using standalone grpc client. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Attachment: HDDS-697.000.patch > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch > > > Similar , to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-709) Modify Close Container handling sequence on datanodes
Shashikant Banerjee created HDDS-709: Summary: Modify Close Container handling sequence on datanodes Key: HDDS-709 URL: https://issues.apache.org/jira/browse/HDDS-709 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee With quasi closed container state for handling majority node failures, the close container handling sequence in Datanodes need to change. Once the datanodes receive a close container command from SCM, the open container replicas individually be marked in the closing state. In a closing state, only the transactions coming from the Ratis leader are allowed , all other write transaction will fail. A close container transaction will be queued via Ratis on the leader which will be replayed to the followers which makes it transition to CLOSED/QUASI CLOSED state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-708: - Status: Patch Available (was: Open) > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-708.000.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658832#comment-16658832 ] Shashikant Banerjee commented on HDDS-708: -- Patch v0 reads the BCSID from OzoneManager while doing a getBlock call to Datanode and it validates the BCSID in BlockManager on Datanode. > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-708.000.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-708: - Fix Version/s: 0.3.0 > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.007.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659639#comment-16659639 ] Shashikant Banerjee commented on HDDS-676: -- Thanks[~anu], for the review comments. {code:java} I agree with this premise; that is we only talk to next data node if we get a failure on the first data node. If that is the case, do we need all this Async framework changes, hash tables etc? {code} if we get a failure/connection issues with one of a datanode, we failover to the next datanode. If we don't maintain the state of the active channels for communication in the hash map, so that when we close the client we close all the conections. If we don't maintain the state, we need to close the connections in active read path as a part of handling the exception. Connection Errors can be transient. Also, multiple ozone clients can use the same XceiverClient instance as we maintain a client cache, so immediately closing the connection in case one client op fails. HashMap will also be helpful if we get the leader info cached, so that we will use that specific channel to execute first. Regarding the async framework change, there is functionally no change in the code.It has been just split to 2 functions so while executing the command we execute on a specific channel. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659639#comment-16659639 ] Shashikant Banerjee edited comment on HDDS-676 at 10/22/18 8:27 PM: Thanks[~anu], for the review comments. {code:java} I agree with this premise; that is we only talk to next data node if we get a failure on the first data node. If that is the case, do we need all this Async framework changes, hash tables etc? {code} if we get a failure/connection issues with one of a datanode, we failover to the next datanode. If we don't maintain the state of the active channels for communication in the hash map, so that when we close the client we close all the conections. we maintain the state, we need to close the connections in active read path as a part of handling the exception. Connection Errors can be transient. Also, multiple ozone clients can use the same XceiverClient instance as we maintain a client cache, so immediately closing the connection in case one client op fails may not be good. HashMap will also be helpful if we get the leader info cached, so that we will use that specific channel to execute first. Regarding the async framework change, there is functionally no change in the code.It has been just split to 2 functions so while executing the command we execute on a specific channel. was (Author: shashikant): Thanks[~anu], for the review comments. {code:java} I agree with this premise; that is we only talk to next data node if we get a failure on the first data node. If that is the case, do we need all this Async framework changes, hash tables etc? {code} if we get a failure/connection issues with one of a datanode, we failover to the next datanode. If we don't maintain the state of the active channels for communication in the hash map, so that when we close the client we close all the conections. If we don't maintain the state, we need to close the connections in active read path as a part of handling the exception. Connection Errors can be transient. Also, multiple ozone clients can use the same XceiverClient instance as we maintain a client cache, so immediately closing the connection in case one client op fails. HashMap will also be helpful if we get the leader info cached, so that we will use that specific channel to execute first. Regarding the async framework change, there is functionally no change in the code.It has been just split to 2 functions so while executing the command we execute on a specific channel. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659639#comment-16659639 ] Shashikant Banerjee edited comment on HDDS-676 at 10/22/18 8:58 PM: Thanks[~anu], for the review comments. {code:java} I agree with this premise; that is we only talk to next data node if we get a failure on the first data node. If that is the case, do we need all this Async framework changes, hash tables etc? {code} if we get a failure/connection issues with one of a datanode, we failover to the next datanode. We maintain the state of the active channels for communication in the hash map, so that when we close the client we close all the conections. we maintain the state, we need to close the connections in active read path as a part of handling the exception. Connection Errors can be transient. Also, multiple ozone clients can use the same XceiverClient instance as we maintain a client cache, so immediately closing the connection in case one client op fails may not be good. HashMap will also be helpful if we get the leader info cached, so that we will use that specific channel to execute first. Regarding the async framework change, there is functionally no change in the code.It has been just split to 2 functions so while executing the command we execute on a specific channel. was (Author: shashikant): Thanks[~anu], for the review comments. {code:java} I agree with this premise; that is we only talk to next data node if we get a failure on the first data node. If that is the case, do we need all this Async framework changes, hash tables etc? {code} if we get a failure/connection issues with one of a datanode, we failover to the next datanode. If we don't maintain the state of the active channels for communication in the hash map, so that when we close the client we close all the conections. we maintain the state, we need to close the connections in active read path as a part of handling the exception. Connection Errors can be transient. Also, multiple ozone clients can use the same XceiverClient instance as we maintain a client cache, so immediately closing the connection in case one client op fails may not be good. HashMap will also be helpful if we get the leader info cached, so that we will use that specific channel to execute first. Regarding the async framework change, there is functionally no change in the code.It has been just split to 2 functions so while executing the command we execute on a specific channel. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659673#comment-16659673 ] Shashikant Banerjee commented on HDDS-676: -- Patch v8 addresses the review comments in *testPutKeyAndGetKey.* > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch, HDDS-676.008.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.008.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, > HDDS-676.006.patch, HDDS-676.007.patch, HDDS-676.008.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-708: - Attachment: HDDS-708.001.patch > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-708.000.patch, HDDS-708.001.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660157#comment-16660157 ] Shashikant Banerjee commented on HDDS-708: -- Thanks [~msingh], for the review comments. Patch v1 addresses your review comments. > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-708.000.patch, HDDS-708.001.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656786#comment-16656786 ] Shashikant Banerjee commented on HDDS-676: -- Test failures and fundbug are not related to the patch. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.006.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, HDDS-676.006.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657885#comment-16657885 ] Shashikant Banerjee commented on HDDS-676: -- patch v6 addresses the checkstyle issues. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch, HDDS-676.002.patch, > HDDS-676.003.patch, HDDS-676.004.patch, HDDS-676.005.patch, HDDS-676.006.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Attachment: HDDS-676.001.patch > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Status: Patch Available (was: Open) > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-676) Enable Read from open Containers via Standalone Protocol
Shashikant Banerjee created HDDS-676: Summary: Enable Read from open Containers via Standalone Protocol Key: HDDS-676 URL: https://issues.apache.org/jira/browse/HDDS-676 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee With BlockCommitSequenceId getting updated per block commit on open containers in OM as well datanode, Ozone Client reads can through Standalone protocol not necessarily requiring Ratis. Client should verify the BCSID of the container which has the data block , which should always be greater than or equal to the BCSID of the block to be read and the existing block BCSID should exactly match that of the block to be read. As a part of this, Client can try to read from a replica with a supplied BCSID and failover to the next one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient
Shashikant Banerjee created HDDS-675: Summary: Add blocking buffer and use watchApi for flush/close in OzoneClient Key: HDDS-675 URL: https://issues.apache.org/jira/browse/HDDS-675 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee For handling 2 node failures, a blocking buffer will be used which will wait for the flush commit index to get updated on all replicas of a container via Ratis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-629) Make ApplyTransaction calls in ContainerStateMachine idempotent
[ https://issues.apache.org/jira/browse/HDDS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650533#comment-16650533 ] Shashikant Banerjee commented on HDDS-629: -- The test failures and the findbug warning are not related to the patch. > Make ApplyTransaction calls in ContainerStateMachine idempotent > --- > > Key: HDDS-629 > URL: https://issues.apache.org/jira/browse/HDDS-629 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-629.000.patch, HDDS-629.001.patch, > HDDS-629.002.patch, HDDS-629.003.patch, HDDS-629.004.patch, > HDDS-629.005.patch, HDDS-629.006.patch > > > When a Datanode restarts, it may lead up to a case where it can reapply > already applied Transactions when it joins the pipeline again . For this > requirement, all ApplyTransaction calls in Ratis need to be made idempotent -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-708) Validate BCSID while reading blocks from containers in datanodes
[ https://issues.apache.org/jira/browse/HDDS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-708: - Resolution: Fixed Fix Version/s: 0.4.0 Status: Resolved (was: Patch Available) Thanks [~anu], [~msingh] for the review. I have committed this to trunk as well as ozone-0.3 branch. > Validate BCSID while reading blocks from containers in datanodes > > > Key: HDDS-708 > URL: https://issues.apache.org/jira/browse/HDDS-708 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-708.000.patch, HDDS-708.001.patch > > > Ozone client while making a getBlock call during reading data , should read > the bcsId from OzoneManager for the block and the same needs to be validated > in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-717) Add a test to write data on datanodes with higher bcsid and commit the key to OM with lower bcsid and then read
[ https://issues.apache.org/jira/browse/HDDS-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-717: - Summary: Add a test to write data on datanodes with higher bcsid and commit the key to OM with lower bcsid and then read (was: Add a test to write data on datanodes with higher BCSID and commit the key to OM with lower bcsid and then read) > Add a test to write data on datanodes with higher bcsid and commit the key to > OM with lower bcsid and then read > --- > > Key: HDDS-717 > URL: https://issues.apache.org/jira/browse/HDDS-717 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-717) Add a test to write data on datanodes with higher BCSID and commit the key to OM with lower bcsid and then read
[ https://issues.apache.org/jira/browse/HDDS-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-717: - Summary: Add a test to write data on datanodes with higher BCSID and commit the key to OM with lower bcsid and then read (was: Add a test to write data on datanodes with higher BCSID and commit the key to OM with lower bcsid) > Add a test to write data on datanodes with higher BCSID and commit the key to > OM with lower bcsid and then read > --- > > Key: HDDS-717 > URL: https://issues.apache.org/jira/browse/HDDS-717 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Status: Patch Available (was: Reopened) > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-676-ozone-0.3.000.patch, HDDS-676.001.patch, > HDDS-676.002.patch, HDDS-676.003.patch, HDDS-676.004.patch, > HDDS-676.005.patch, HDDS-676.006.patch, HDDS-676.007.patch, HDDS-676.008.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Status: Patch Available (was: Open) > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-717) Add a test to write data on datanodes with higher BCSID and commit the key to OM with lower bcsid
Shashikant Banerjee created HDDS-717: Summary: Add a test to write data on datanodes with higher BCSID and commit the key to OM with lower bcsid Key: HDDS-717 URL: https://issues.apache.org/jira/browse/HDDS-717 Project: Hadoop Distributed Data Store Issue Type: Test Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-716) Update ozone to latest ratis snapshot build(0.3.0-aa38160-SNAPSHOT)
[ https://issues.apache.org/jira/browse/HDDS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662200#comment-16662200 ] Shashikant Banerjee commented on HDDS-716: -- Thanks [~msingh] for working on this and [~jnp] for the review, I have committed this to trunk. The Jira needs a patch for ozone 0.3 branch as well. > Update ozone to latest ratis snapshot build(0.3.0-aa38160-SNAPSHOT) > --- > > Key: HDDS-716 > URL: https://issues.apache.org/jira/browse/HDDS-716 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDDS-716.001.patch, HDDS-716.002.patch, > HDDS-716.003.patch > > > This jira updates the ozone to latest ratis snapshot > build(0.3.0-aa38160-SNAPSHOT) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-716) Update ozone to latest ratis snapshot build(0.3.0-aa38160-SNAPSHOT)
[ https://issues.apache.org/jira/browse/HDDS-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662035#comment-16662035 ] Shashikant Banerjee commented on HDDS-716: -- +1 on the latest patch. I will commit this shortly. > Update ozone to latest ratis snapshot build(0.3.0-aa38160-SNAPSHOT) > --- > > Key: HDDS-716 > URL: https://issues.apache.org/jira/browse/HDDS-716 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDDS-716.001.patch, HDDS-716.002.patch, > HDDS-716.003.patch > > > This jira updates the ozone to latest ratis snapshot > build(0.3.0-aa38160-SNAPSHOT) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-726) Ozone Client should update SCM to move the container out of allocation path in case a write transaction fails
Shashikant Banerjee created HDDS-726: Summary: Ozone Client should update SCM to move the container out of allocation path in case a write transaction fails Key: HDDS-726 URL: https://issues.apache.org/jira/browse/HDDS-726 Project: Hadoop Distributed Data Store Issue Type: Test Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Once an container write transaction fails, it will be marked corrupted. Once Ozone client gets an exception in such case it should tell SCM to move the container out of allocation path. SCM will eventually close the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-728) Datanodes are going to dead state after some interval
[ https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662359#comment-16662359 ] Shashikant Banerjee commented on HDDS-728: -- [~ssulav], can you attach the SCM logs as well as logs for other Datanodes? > Datanodes are going to dead state after some interval > - > > Key: HDDS-728 > URL: https://issues.apache.org/jira/browse/HDDS-728 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.3.0 >Reporter: Soumitra Sulav >Priority: Major > Attachments: > hadoop-root-datanode-ctr-e138-1518143905142-541600-02-03.hwx.site.log > > > Setup a 5 datanode ozone cluster with HDP on top of it. > After restarting all HDP services few times encountered below issue which is > making the HDP services to fail. > Same exception was observed in an old setup but I thought it could have been > issue with the setup but now encountered the same issue in new setup as well. > {code:java} > 2018-10-24 10:42:03,308 WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote > 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0 > org.apache.ratis.protocol.GroupMismatchException: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114) > at > org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256) > at > org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-10-24 10:42:03,342 WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote > 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0 > org.apache.ratis.protocol.GroupMismatchException: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114) > at > org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256) > at > org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-10-24
[jira] [Updated] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-676: - Resolution: Fixed Fix Version/s: 0.3.0 Target Version/s: 0.3.0, 0.4.0 (was: 0.3.0) Status: Resolved (was: Patch Available) Thanks [~nandakumar131], [~jnp], [~anu] for the reviews. I have committed this change to trunk and ozone-0.3 branch. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-676-ozone-0.3.000.patch, HDDS-676.001.patch, > HDDS-676.002.patch, HDDS-676.003.patch, HDDS-676.004.patch, > HDDS-676.005.patch, HDDS-676.006.patch, HDDS-676.007.patch, HDDS-676.008.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-720) ContainerReportPublisher fails when the container is marked unhealthy on Datanodes
Shashikant Banerjee created HDDS-720: Summary: ContainerReportPublisher fails when the container is marked unhealthy on Datanodes Key: HDDS-720 URL: https://issues.apache.org/jira/browse/HDDS-720 Project: Hadoop Distributed Data Store Issue Type: Test Components: Ozone Datanode, SCM Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee {code:java} 2018-10-24 01:15:00,265 ERROR report.ReportPublisher (ReportPublisher.java:publishReport(88)) - Exception while publishing report. org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Invalid Container state found: 2 at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.getHddsState(KeyValueContainer.java:558) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.getContainerReport(KeyValueContainer.java:532) at org.apache.hadoop.ozone.container.common.impl.ContainerSet.getContainerReport(ContainerSet.java:203) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.getContainerReport(OzoneContainer.java:168) at org.apache.hadoop.ozone.container.common.report.ContainerReportPublisher.getReport(ContainerReportPublisher.java:83) at org.apache.hadoop.ozone.container.common.report.ContainerReportPublisher.getReport(ContainerReportPublisher.java:50) at org.apache.hadoop.ozone.container.common.report.ReportPublisher.publishReport(ReportPublisher.java:86) at org.apache.hadoop.ozone.container.common.report.ReportPublisher.run(ReportPublisher.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} There is no mapping exist for Unhealthy state in Datanode for containers to LifecycleState of containers in SCM. Hence, the container report publisher fails with Invalid container state exception. A container is marked unhealthy in Datanode only if a certain write transaction fails, so that successive updates get rejected and a close container action is initiated to SCM to close the container. For all practical cases, a container in unhealthy state can also be mapped to a container in closing state in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Attachment: HDDS-697.001.patch > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661217#comment-16661217 ] Shashikant Banerjee commented on HDDS-697: -- Patch v1 fixes the test failures. testContainerStateMachineFailures seems like a flaky test where we mark the container in unhealthy state and wait for the closeContainerAction to be queued. But, before the assert call to verify whether action exists in the pending actions queue executes, in Datanode, it might get already removed from the action queue to be sent to SCM by the datanode. As a result of which, sometimes the test works sometimes doesn't. Removed the assert condition to verify the pending action queue from the test to make the test more stable. > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Attachment: (was: HDDS-697.001.patch) > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Status: Open (was: Patch Available) > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Attachment: HDDS-697.001.patch > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665141#comment-16665141 ] Shashikant Banerjee edited comment on HDDS-721 at 10/26/18 1:03 PM: The issue seems like not resolved with HDDS-676. SCM can return a Ratis pipeline to the client where the filed corresponding to the leaderId can be null. At client, only the Replication type is overwriiten to Type Stand_Alone. In case, leader datanode is not set, it will still fail with NULL pointer exception. was (Author: shashikant): The issue seems to get resolved with HDDS-676. SCM can return a Ratis pipeline to the client where the filed corresponding to the leaderId can be null. At client, only the Replication type is overwriiten to Type Stand_Alone. In case, leader datanode is not set, it will still fail with NULL pointer exception. > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Critical > Attachments: all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665141#comment-16665141 ] Shashikant Banerjee commented on HDDS-721: -- The issue seems to get resolved with HDDS-676. SCM can return a Ratis pipeline to the client where the filed corresponding to the leaderId can be null. At client, only the Replication type is overwriiten to Type Stand_Alone. In case, leader datanode is not set, it will still fail with NULL pointer exception. > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Critical > Attachments: all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-721: Assignee: Shashikant Banerjee > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-721: - Attachment: HDDS-721-ozone-0.3.000.patch > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: HDDS-721-ozone-0.3.000.patch, > all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-721: - Status: Patch Available (was: Open) > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: HDDS-721-ozone-0.3.000.patch, > all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1470#comment-1470 ] Shashikant Banerjee commented on HDDS-721: -- This needs to be fixed in ozone 0.3 branch only. The problem should be fixed with HDDS-694 in trunk. > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: HDDS-721-ozone-0.3.000.patch, > all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-749) Restructure BlockId class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667926#comment-16667926 ] Shashikant Banerjee commented on HDDS-749: -- Patch v1 fixes the related test failure and checkstyle issue. > Restructure BlockId class in Ozone > -- > > Key: HDDS-749 > URL: https://issues.apache.org/jira/browse/HDDS-749 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-749.000.patch, HDDS-749.001.patch > > > As a part of block allocation in SCM, SCM will return a containerBlockId > which constitutes of containerId and localId. Once OM gets the allocated > Blocks from SCM, it will create a BlockId object which constitutes of > containerID , localId and BlockCommitSequenceId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-749) Restructure BlockId class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-749: - Attachment: HDDS-749.001.patch > Restructure BlockId class in Ozone > -- > > Key: HDDS-749 > URL: https://issues.apache.org/jira/browse/HDDS-749 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-749.000.patch, HDDS-749.001.patch > > > As a part of block allocation in SCM, SCM will return a containerBlockId > which constitutes of containerId and localId. Once OM gets the allocated > Blocks from SCM, it will create a BlockId object which constitutes of > containerID , localId and BlockCommitSequenceId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-721) NullPointerException thrown while trying to read a file when datanode restarted
[ https://issues.apache.org/jira/browse/HDDS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-721: - Attachment: HDDS-721-ozone-0.3.001.patch > NullPointerException thrown while trying to read a file when datanode > restarted > --- > > Key: HDDS-721 > URL: https://issues.apache.org/jira/browse/HDDS-721 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: HDDS-721-ozone-0.3.000.patch, > HDDS-721-ozone-0.3.001.patch, all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > --- > # Put few files and directories using ozonefs > # stopped all services of cluster. > # started the scm, om and then datanodes. > While datanodes were starting up, tried to read a file. Null pointer > Exception was thrown. > > {noformat} > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -ls -R / > 2018-10-24 04:48:00,703 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > drwxrwxrwx - root root 0 2018-10-24 04:12 /testdir1 > -rw-rw-rw- 1 root root 5368709120 1970-02-25 15:29 /testdir1/5GB > -rw-rw-rw- 1 root root 4798 1970-02-25 15:22 /testdir1/passwd > drwxrwxrwx - root root 0 2018-10-24 04:46 /testdir3 > [root@ctr-e138-1518143905142-53-01-03 ~]# > /root/hadoop_trunk/ozone-0.3.0-SNAPSHOT/bin/ozone fs -cat > o3fs://fs-bucket.fs-volume/testdir1/passwd > 2018-10-24 04:49:24,955 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > cat: Exception getting XceiverClient: > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.NullPointerException{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-749) Restructure BlockId class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-749: - Attachment: HDDS-749.002.patch > Restructure BlockId class in Ozone > -- > > Key: HDDS-749 > URL: https://issues.apache.org/jira/browse/HDDS-749 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-749.000.patch, HDDS-749.001.patch, > HDDS-749.002.patch > > > As a part of block allocation in SCM, SCM will return a containerBlockId > which constitutes of containerId and localId. Once OM gets the allocated > Blocks from SCM, it will create a BlockId object which constitutes of > containerID , localId and BlockCommitSequenceId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-749) Restructure BlockId class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668197#comment-16668197 ] Shashikant Banerjee commented on HDDS-749: -- Thanks [~jnp], for the review comments. Patch v2 addresses your review comments. > Restructure BlockId class in Ozone > -- > > Key: HDDS-749 > URL: https://issues.apache.org/jira/browse/HDDS-749 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-749.000.patch, HDDS-749.001.patch, > HDDS-749.002.patch > > > As a part of block allocation in SCM, SCM will return a containerBlockId > which constitutes of containerId and localId. Once OM gets the allocated > Blocks from SCM, it will create a BlockId object which constitutes of > containerID , localId and BlockCommitSequenceId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668475#comment-16668475 ] Shashikant Banerjee commented on HDDS-697: -- Patch v2 is rebased to latest trunk. > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch, > HDDS-697.002.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Attachment: HDDS-697.002.patch > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch, > HDDS-697.002.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-774) Remove OpenContainerBlockMap from datanode
Shashikant Banerjee created HDDS-774: Summary: Remove OpenContainerBlockMap from datanode Key: HDDS-774 URL: https://issues.apache.org/jira/browse/HDDS-774 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Affects Versions: 0.4.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.4.0 With HDDS-675, partial flush of uncommitted keys on Datanodes is not required. OpenContainerBlockMap hence serves no purpose anymore. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-709) Modify Close Container handling sequence on datanodes
[ https://issues.apache.org/jira/browse/HDDS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-709: - Attachment: HDDS-709.000.patch > Modify Close Container handling sequence on datanodes > - > > Key: HDDS-709 > URL: https://issues.apache.org/jira/browse/HDDS-709 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-709.000.patch > > > With quasi closed container state for handling majority node failures, the > close container handling sequence in Datanodes need to change. Once the > datanodes receive a close container command from SCM, the open container > replicas individually be marked in the closing state. In a closing state, > only the transactions coming from the Ratis leader are allowed , all other > write transaction will fail. A close container transaction will be queued via > Ratis on the leader which will be replayed to the followers which makes it > transition to CLOSED/QUASI CLOSED state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-709) Modify Close Container handling sequence on datanodes
[ https://issues.apache.org/jira/browse/HDDS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-709: - Status: Patch Available (was: Open) > Modify Close Container handling sequence on datanodes > - > > Key: HDDS-709 > URL: https://issues.apache.org/jira/browse/HDDS-709 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-709.000.patch > > > With quasi closed container state for handling majority node failures, the > close container handling sequence in Datanodes need to change. Once the > datanodes receive a close container command from SCM, the open container > replicas individually be marked in the closing state. In a closing state, > only the transactions coming from the Ratis leader are allowed , all other > write transaction will fail. A close container transaction will be queued via > Ratis on the leader which will be replayed to the followers which makes it > transition to CLOSED/QUASI CLOSED state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-675: - Attachment: HDDS-675.000.patch > Add blocking buffer and use watchApi for flush/close in OzoneClient > --- > > Key: HDDS-675 > URL: https://issues.apache.org/jira/browse/HDDS-675 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-675.000.patch > > > For handling 2 node failures, a blocking buffer will be used which will wait > for the flush commit index to get updated on all replicas of a container via > Ratis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-675: - Status: Patch Available (was: Open) > Add blocking buffer and use watchApi for flush/close in OzoneClient > --- > > Key: HDDS-675 > URL: https://issues.apache.org/jira/browse/HDDS-675 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-675.000.patch > > > For handling 2 node failures, a blocking buffer will be used which will wait > for the flush commit index to get updated on all replicas of a container via > Ratis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670051#comment-16670051 ] Shashikant Banerjee commented on HDDS-675: -- updated 1st patch. > Add blocking buffer and use watchApi for flush/close in OzoneClient > --- > > Key: HDDS-675 > URL: https://issues.apache.org/jira/browse/HDDS-675 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-675.000.patch > > > For handling 2 node failures, a blocking buffer will be used which will wait > for the flush commit index to get updated on all replicas of a container via > Ratis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-774) Remove OpenContainerBlockMap from datanode
[ https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670121#comment-16670121 ] Shashikant Banerjee commented on HDDS-774: -- This is blocked on HDDS-675. > Remove OpenContainerBlockMap from datanode > -- > > Key: HDDS-774 > URL: https://issues.apache.org/jira/browse/HDDS-774 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-774.000.patch > > > With HDDS-675, partial flush of uncommitted keys on Datanodes is not > required. OpenContainerBlockMap hence serves no purpose anymore. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-774) Remove OpenContainerBlockMap from datanode
[ https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-774: - Attachment: HDDS-774.000.patch > Remove OpenContainerBlockMap from datanode > -- > > Key: HDDS-774 > URL: https://issues.apache.org/jira/browse/HDDS-774 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-774.000.patch > > > With HDDS-675, partial flush of uncommitted keys on Datanodes is not > required. OpenContainerBlockMap hence serves no purpose anymore. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~jnp] for the review. I have fixed the checkstyle issues and committed this change to trunk. The test failures and ASF license warnings are not related to the patch. > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch, > HDDS-697.002.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Fix Version/s: 0.4.0 > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-697.000.patch, HDDS-697.001.patch, > HDDS-697.002.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-709) Modify Close Container handling sequence on datanodes
[ https://issues.apache.org/jira/browse/HDDS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-709: - Status: Open (was: Patch Available) > Modify Close Container handling sequence on datanodes > - > > Key: HDDS-709 > URL: https://issues.apache.org/jira/browse/HDDS-709 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-709.000.patch > > > With quasi closed container state for handling majority node failures, the > close container handling sequence in Datanodes need to change. Once the > datanodes receive a close container command from SCM, the open container > replicas individually be marked in the closing state. In a closing state, > only the transactions coming from the Ratis leader are allowed , all other > write transaction will fail. A close container transaction will be queued via > Ratis on the leader which will be replayed to the followers which makes it > transition to CLOSED/QUASI CLOSED state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-675: - Attachment: HDDS-675.001.patch > Add blocking buffer and use watchApi for flush/close in OzoneClient > --- > > Key: HDDS-675 > URL: https://issues.apache.org/jira/browse/HDDS-675 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-675.000.patch, HDDS-675.001.patch > > > For handling 2 node failures, a blocking buffer will be used which will wait > for the flush commit index to get updated on all replicas of a container via > Ratis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-771) ChunkGroupOutputStream stream entries need to be properly updated on closed container exception
[ https://issues.apache.org/jira/browse/HDDS-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671391#comment-16671391 ] Shashikant Banerjee commented on HDDS-771: -- Thanks [~ljain], for the patch. The patch looks good to me. I am +1 on this. > ChunkGroupOutputStream stream entries need to be properly updated on closed > container exception > --- > > Key: HDDS-771 > URL: https://issues.apache.org/jira/browse/HDDS-771 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Blocker > Attachments: HDDS-771-ozone-0.3.001.patch, HDDS-771.001.patch > > > Currently ChunkGroupOutputStream does not increment the currentStreamIndex > when a chunk write completes but there is no data in the buffer. This leads > to overwriting of stream entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-771) ChunkGroupOutputStream stream entries need to be properly updated on closed container exception
[ https://issues.apache.org/jira/browse/HDDS-771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-771: - Resolution: Fixed Fix Version/s: 0.4.0 0.3.0 Status: Resolved (was: Patch Available) Thanks [~ljain] for the contribution. I have committed this change to trunk as well as ozone-0.3 branch. > ChunkGroupOutputStream stream entries need to be properly updated on closed > container exception > --- > > Key: HDDS-771 > URL: https://issues.apache.org/jira/browse/HDDS-771 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-771-ozone-0.3.001.patch, HDDS-771.001.patch > > > Currently ChunkGroupOutputStream does not increment the currentStreamIndex > when a chunk write completes but there is no data in the buffer. This leads > to overwriting of stream entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671455#comment-16671455 ] Shashikant Banerjee commented on HDDS-675: -- Patch v1: Rebased to latest trunk. > Add blocking buffer and use watchApi for flush/close in OzoneClient > --- > > Key: HDDS-675 > URL: https://issues.apache.org/jira/browse/HDDS-675 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-675.000.patch, HDDS-675.001.patch > > > For handling 2 node failures, a blocking buffer will be used which will wait > for the flush commit index to get updated on all replicas of a container via > Ratis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-697) update and validate the BCSID for PutSmallFile/GetSmallFile command
[ https://issues.apache.org/jira/browse/HDDS-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-697: - Status: Patch Available (was: Open) > update and validate the BCSID for PutSmallFile/GetSmallFile command > --- > > Key: HDDS-697 > URL: https://issues.apache.org/jira/browse/HDDS-697 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-697.000.patch, HDDS-697.001.patch, > HDDS-697.002.patch > > > Similar to putBlock/GetBlock, putSmallFile transaction in Ratis needs to > update the BCSID in the container db on datanode. getSmallFile should > validate the bcsId while reading the block similar to getBlock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-749) Restructure BlockId class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668363#comment-16668363 ] Shashikant Banerjee commented on HDDS-749: -- The test failure reported is not related to the patch. Thanks [~jnp], for the review. I have committed this change to trunk. > Restructure BlockId class in Ozone > -- > > Key: HDDS-749 > URL: https://issues.apache.org/jira/browse/HDDS-749 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-749.000.patch, HDDS-749.001.patch, > HDDS-749.002.patch > > > As a part of block allocation in SCM, SCM will return a containerBlockId > which constitutes of containerId and localId. Once OM gets the allocated > Blocks from SCM, it will create a BlockId object which constitutes of > containerID , localId and BlockCommitSequenceId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-749) Restructure BlockId class in Ozone
[ https://issues.apache.org/jira/browse/HDDS-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-749: - Resolution: Fixed Status: Resolved (was: Patch Available) > Restructure BlockId class in Ozone > -- > > Key: HDDS-749 > URL: https://issues.apache.org/jira/browse/HDDS-749 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-749.000.patch, HDDS-749.001.patch, > HDDS-749.002.patch > > > As a part of block allocation in SCM, SCM will return a containerBlockId > which constitutes of containerId and localId. Once OM gets the allocated > Blocks from SCM, it will create a BlockId object which constitutes of > containerID , localId and BlockCommitSequenceId. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-728) Datanodes are going to dead state after some interval
[ https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663767#comment-16663767 ] Shashikant Banerjee commented on HDDS-728: -- Thanks [~msingh] for the patch. The patch looks good to me as well. In addition to Nanda's comments: I think its better to have the executor service array in containerStateMachine itself and shut it down during close. Since, we are now passing an array reference over containerStateMachine constructor, it may give a findbug warning as well. > Datanodes are going to dead state after some interval > - > > Key: HDDS-728 > URL: https://issues.apache.org/jira/browse/HDDS-728 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.3.0 >Reporter: Soumitra Sulav >Assignee: Mukul Kumar Singh >Priority: Major > Attachments: HDDS-728.001.patch, > hadoop-root-datanode-ctr-e138-1518143905142-541600-02-02.hwx.site.log, > hadoop-root-datanode-ctr-e138-1518143905142-541600-02-03.hwx.site.log, > hadoop-root-om-ctr-e138-1518143905142-541600-02-02.hwx.site.log, > hadoop-root-scm-ctr-e138-1518143905142-541600-02-02.hwx.site.log, > om-audit-ctr-e138-1518143905142-541600-02-02.hwx.site.log > > > Setup a 5 datanode ozone cluster with HDP on top of it. > After restarting all HDP services few times encountered below issue which is > making the HDP services to fail. > Same exception was observed in an old setup but I thought it could have been > issue with the setup but now encountered the same issue in new setup as well. > {code:java} > 2018-10-24 10:42:03,308 WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote > 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0 > org.apache.ratis.protocol.GroupMismatchException: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114) > at > org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256) > at > org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-10-24 10:42:03,342 WARN > org.apache.ratis.grpc.server.GrpcServerProtocolService: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote > 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0 > org.apache.ratis.protocol.GroupMismatchException: > 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114) > at > org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256) > at > org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411) > at > org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54) > at > org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) > at >
[jira] [Commented] (HDDS-799) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/HDDS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674276#comment-16674276 ] Shashikant Banerjee commented on HDDS-799: -- Thanks [~msingh], for the patch. The test failures are related. Can you please check? > writeStateMachineData times out > --- > > Key: HDDS-799 > URL: https://issues.apache.org/jira/browse/HDDS-799 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Mukul Kumar Singh >Priority: Blocker > Fix For: 0.3.0 > > Attachments: HDDS-799-ozone-0.3.001.patch, > all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-799) writeStateMachineData times out
[ https://issues.apache.org/jira/browse/HDDS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674341#comment-16674341 ] Shashikant Banerjee commented on HDDS-799: -- Thanks [~msingh] for updating the patch. I am +1 on this. Can you please provide a patch for trunk as well? > writeStateMachineData times out > --- > > Key: HDDS-799 > URL: https://issues.apache.org/jira/browse/HDDS-799 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Mukul Kumar Singh >Priority: Blocker > Fix For: 0.3.0 > > Attachments: HDDS-799-ozone-0.3.001.patch, > HDDS-799-ozone-0.3.002.patch, all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-794) Add configs to set StateMachineData write timeout in ContainerStateMachine
[ https://issues.apache.org/jira/browse/HDDS-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674452#comment-16674452 ] Shashikant Banerjee commented on HDDS-794: -- Thanks [~msingh], for the review. Patch v1 addresses your review comments. > Add configs to set StateMachineData write timeout in ContainerStateMachine > -- > > Key: HDDS-794 > URL: https://issues.apache.org/jira/browse/HDDS-794 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-794.000.patch, HDDS-794.001.patch > > > The patch will address adding config settings in Ozone which will > enable/disable timeout for StateMachineData write via Ratis. It also adds > some debug logs in writeChunk handling path inside datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-794) Add configs to set StateMachineData write timeout in ContainerStateMachine
[ https://issues.apache.org/jira/browse/HDDS-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-794: - Attachment: HDDS-794.001.patch > Add configs to set StateMachineData write timeout in ContainerStateMachine > -- > > Key: HDDS-794 > URL: https://issues.apache.org/jira/browse/HDDS-794 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-794.000.patch, HDDS-794.001.patch > > > The patch will address adding config settings in Ozone which will > enable/disable timeout for StateMachineData write via Ratis. It also adds > some debug logs in writeChunk handling path inside datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-794) Add configs to set StateMachineData write timeout in ContainerStateMachine
Shashikant Banerjee created HDDS-794: Summary: Add configs to set StateMachineData write timeout in ContainerStateMachine Key: HDDS-794 URL: https://issues.apache.org/jira/browse/HDDS-794 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.3.0, 0.4.0 The patch will address adding config settings in Ozone which will enable/disable timeout for StateMachineData write via Ratis. It also adds some debug logs in writeChunk handling path inside datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org