[jira] [Commented] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886071#comment-16886071 ] Shashikant Banerjee commented on HDDS-1809: --- The issue is happening bcoz, while doing the read, with rackAwareness enabled, pipeline.getNodesInOrder call returns the same datanode added thrice in the datanodeList as shown below and hence, if a failure is encountered read is retried on the same dn. {code:java} if ((request.getCmdType() == ContainerProtos.Type.ReadChunk || request.getCmdType() == ContainerProtos.Type.GetSmallFile) && topologyAwareRead) { datanodeList = pipeline.getNodesInOrder(); } else { datanodeList = pipeline.getNodes(); } datanodeList [f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, networkLocation: /default-rack, certSerialId: null}, f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, networkLocation: /default-rack, certSerialId: null}, f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, networkLocation: /default-rack, certSerialId: null}] Pipeline[ Id: 865a2079-de8e-472c-baaa-5aa345ed5e57, Nodes: f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, networkLocation: /default-rack, certSerialId: null}14975e64-2564-433d-9b89-c295083a1161{ip: 192.168.43.156, host: hw15685, networkLocation: /default-rack, certSerialId: null}efc0749c-c7eb-4b73-a4b2-0abe553ca5e9{ip: 192.168.43.156, host: hw15685, networkLocation: /default-rack, certSerialId: null}, Type:STAND_ALONE, Factor:THREE, State:OPEN] {code} The read path works well with the Neworktopology feature turned off. > Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis > pipeline > - > > Key: HDDS-1809 > URL: https://issues.apache.org/jira/browse/HDDS-1809 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > {code:java} > java.io.IOException: Unexpected OzoneException: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at >
[jira] [Assigned] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1809: - Assignee: (was: Shashikant Banerjee) > Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis > pipeline > - > > Key: HDDS-1809 > URL: https://issues.apache.org/jira/browse/HDDS-1809 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > {code:java} > java.io.IOException: Unexpected OzoneException: java.io.IOException: > java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) > at > org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458) > at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline
Shashikant Banerjee created HDDS-1809: - Summary: Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline Key: HDDS-1809 URL: https://issues.apache.org/jira/browse/HDDS-1809 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 {code:java} java.io.IOException: Unexpected OzoneException: java.io.IOException: java.util.concurrent.ExecutionException: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259) at org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709) at org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458) at org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1808) TestRatisPipelineCreateAndDestory#testPipelineCreationOnNodeRestart times out
Shashikant Banerjee created HDDS-1808: - Summary: TestRatisPipelineCreateAndDestory#testPipelineCreationOnNodeRestart times out Key: HDDS-1808 URL: https://issues.apache.org/jira/browse/HDDS-1808 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 {code:java} Error Message test timed out after 3 milliseconds Stacktrace java.lang.Exception: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382) at org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory.waitForPipelines(TestRatisPipelineCreateAndDestory.java:126) at org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory.testPipelineCreationOnNodeRestart(TestRatisPipelineCreateAndDestory.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1807) TestWatchForCommit#testWatchForCommitForRetryfailure fails as a result of no leader election for extended period of time
Shashikant Banerjee created HDDS-1807: - Summary: TestWatchForCommit#testWatchForCommitForRetryfailure fails as a result of no leader election for extended period of time Key: HDDS-1807 URL: https://issues.apache.org/jira/browse/HDDS-1807 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 {code:java} org.apache.ratis.protocol.RaftRetryFailureException: Failed RaftClientRequest:client-6C83DC527A4C->73bdd98d-b003-44ff-a45b-bd12dfd50509@group-75C642DF7AE9, cid=55, seq=1*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$407/213850519@1a8843a2 for 10 attempts with RetryLimited(maxAttempts=10, sleepTime=1000ms) Stacktrace java.util.concurrent.ExecutionException: org.apache.ratis.protocol.RaftRetryFailureException: Failed RaftClientRequest:client-6C83DC527A4C->73bdd98d-b003-44ff-a45b-bd12dfd50509@group-75C642DF7AE9, cid=55, seq=1*, RW, org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$407/213850519@1a8843a2 for 10 attempts with RetryLimited(maxAttempts=10, sleepTime=1000ms) at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:345) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} The client here retries times with a delay of 1 sec between each retry but leader eleactiocouldnot complete. {code:java} 2019-07-12 19:30:46,451 INFO client.GrpcClientProtocolClient (GrpcClientProtocolClient.java:onNext(255)) - client-6C83DC527A4C->5931fd83-b899-480e-b15a-ecb8e7f7dd46: receive RaftClientReply:client-6C83DC527A4C->5931fd83-b899-480e-b15a-ecb8e7f7dd46@group-75C642DF7AE9, cid=55, FAILED org.apache.ratis.protocol.NotLeaderException: Server 5931fd83-b899-480e-b15a-ecb8e7f7dd46 is not the leader (null). Request must be sent to leader., logIndex=0, commits[5931fd83-b899-480e-b15a-ecb8e7f7dd46:c-1] 2019-07-12 19:30:47,469 INFO client.GrpcClientProtocolClient (GrpcClientProtocolClient.java:onNext(255)) - client-6C83DC527A4C->d83929f1-c4db-499d-b67f-ad7f10dd7dde: receive RaftClientReply:client-6C83DC527A4C->d83929f1-c4db-499d-b67f-ad7f10dd7dde@group-75C642DF7AE9, cid=55, FAILED org.apache.ratis.protocol.NotLeaderException: Server d83929f1-c4db-499d-b67f-ad7f10dd7dde is not the leader (null). Request must be sent to leader., logIndex=0, commits[d83929f1-c4db-499d-b67f-ad7f10dd7dde:c-1] 2019-07-12 19:30:48,504 INFO client.GrpcClientProtocolClient (GrpcClientProtocolClient.java:onNext(255)) -
[jira] [Created] (HDDS-1806) TestDataValidateWithSafeByteOperations tests are failing
Shashikant Banerjee created HDDS-1806: - Summary: TestDataValidateWithSafeByteOperations tests are failing Key: HDDS-1806 URL: https://issues.apache.org/jira/browse/HDDS-1806 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 {code:java} Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 3 does not exist Stacktrace java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 3 does not exist at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:549) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:540) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$2(BlockOutputStream.java:615) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 3 does not exist at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:537) ... 7 more {code} The error propagated to client is erroneous. The container creation failed as a result disk full condition but never propagated to client. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1804) TestCloseContainerHandlingByClient#estBlockWrites fails intermittently
Shashikant Banerjee created HDDS-1804: - Summary: TestCloseContainerHandlingByClient#estBlockWrites fails intermittently Key: HDDS-1804 URL: https://issues.apache.org/jira/browse/HDDS-1804 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 The test fails intermittently as reported here: [https://builds.apache.org/job/hadoop-multibranch/job/PR-1082/1/testReport/org.apache.hadoop.ozone.client.rpc/TestCloseContainerHandlingByClient/testBlockWrites/] {code:java} java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) at java.io.InputStream.read(InputStream.java:101) at org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709) at org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.validateData(TestCloseContainerHandlingByClient.java:401) at org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testBlockWrites(TestCloseContainerHandlingByClient.java:471) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.
[ https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883727#comment-16883727 ] Shashikant Banerjee commented on HDDS-1753: --- The issue being caused here is as data is still to be replicated to the followers via leader, as a result of key delete , a block in a closed container can get deleted on the leader. When the follower asks for the chunk data from the leader, it fails as the chunk file does not exist in the leader. The solution being proposed here is as follows: Whenever a delete command gets received on a datanode from SCM, it should first check the min replicated index across all the servers in the pipeline. ContainerStateMachine will also track, the close container log index for each cotainer. Now, if the min replicated index >= close container index in the leader, a delete operation will be queued over Ratis in the leader and same will be ignored in the follower and now delete will happen over Ratis. In case, close container index is not replicated, delete transaction will never be enqueued over Ratis and ignored. SCM already has a retry policy in place to retry the same delete. In case, the Ratis pipeline does not exist, delete will work as is. > Datanode unable to find chunk while replication data using ratis. > - > > Key: HDDS-1753 > URL: https://issues.apache.org/jira/browse/HDDS-1753 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > > Leader datanode is unable to read chunk from the datanode while replicating > data from leader to follower. > Please note that deletion of keys is also happening while the data is being > replicated. > {code} > 2019-07-02 19:39:22,604 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl > (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3 > -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048} > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 1) > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace > ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c > hunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 2) > 2019-07-02 19:39:22,606 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | > op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | > ret=FAILURE > java.lang.Exception: Unable to find the chunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at >
[jira] [Updated] (HDDS-1492) Generated chunk size name too long.
[ https://issues.apache.org/jira/browse/HDDS-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1492: -- Status: Patch Available (was: Open) > Generated chunk size name too long. > --- > > Key: HDDS-1492 > URL: https://issues.apache.org/jira/browse/HDDS-1492 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Following exception is seen in SCM logs intermittently. > {code} > java.lang.RuntimeException: file name > 'chunks/2a54b2a153f4a9c5da5f44e2c6f97c60_stream_9c6ac565-e2d4-469c-bd5c-47922a35e798_chunk_10.tmp.2.23115' > is too long ( > 100 bytes) > {code} > We may have to limit the name of the chunk to 100 bytes. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14499: --- Resolution: Fixed Fix Version/s: 3.3.0 Target Version/s: 3.3.0 Status: Resolved (was: Patch Available) Thanks [~szetszwo] for the review. I have committed this change to trunk. > Misleading REM_QUOTA value with snapshot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, > HDFS-14499.002.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1789) BlockOutputStream#watchForCommit fails with UnsupportedOperationException
Shashikant Banerjee created HDDS-1789: - Summary: BlockOutputStream#watchForCommit fails with UnsupportedOperationException Key: HDDS-1789 URL: https://issues.apache.org/jira/browse/HDDS-1789 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 {code:java} ```2019-07-12 08:45:17,981 ERROR ozone.MiniOzoneLoadGenerator (MiniOzoneLoadGenerator.java:load(105)) - LOADGEN: Create key:pool-444-thread-5-1328179725 failed with exception, skipping java.lang.UnsupportedOperationException at java.util.AbstractList.add(AbstractList.java:148) at java.util.AbstractList.add(AbstractList.java:108) at java.util.AbstractCollection.addAll(AbstractCollection.java:344) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:363) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:332) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:259) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) at org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) at java.io.OutputStream.write(OutputStream.java:75) at org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:103) at org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:152) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)``` ye ek aur issue hai from Chaos please raise a bug Shashikant Banerjee [9:56 AM] okk actually jstacks are taken at 15 min interval i am yet to find any common hanging thread among all the 3 jstacks Mukul Kumar Singh [10:00 AM] 2nd file mein ```java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fb5b29ed228> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1595) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:4894) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1438) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:118) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)``` {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14499: --- Attachment: HDFS-14499.002.patch > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, > HDFS-14499.002.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882717#comment-16882717 ] Shashikant Banerjee commented on HDFS-14499: Thanks [~szetszwo]. Patch v2 address the review comments. > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, > HDFS-14499.002.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1780) TestFailureHandlingByClient tests are flaky
[ https://issues.apache.org/jira/browse/HDDS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1780: -- Status: Patch Available (was: Open) > TestFailureHandlingByClient tests are flaky > --- > > Key: HDDS-1780 > URL: https://issues.apache.org/jira/browse/HDDS-1780 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The tests seem to fail bcoz , when the datanode goes down with stale node > interval being set to a low value, containers may get closed early and client > writes might fail with closed container exception rather than pipeline > failure/Timeout exceptions as excepted in the tests. The fix made here is to > tune the stale node interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882210#comment-16882210 ] Shashikant Banerjee commented on HDFS-14499: [~szetszwo], can you please have a look? > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1780) TestFailureHandlingByClient tests are flaky
Shashikant Banerjee created HDDS-1780: - Summary: TestFailureHandlingByClient tests are flaky Key: HDDS-1780 URL: https://issues.apache.org/jira/browse/HDDS-1780 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 The tests seem to fail bcoz , when the datanode goes down with stale node interval being set to a low value, containers may get closed early and client writes might fail with closed container exception rather than pipeline failure/Timeout exceptions as excepted in the tests. The fix made here is to tune the stale node interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky
[ https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1779: -- Description: The tests have become flaky bcoz once nodes are shutdown inn Ratis pipeline, a watch request can either be received at server at a server and fail with NotReplicatedException or sometimes it fails with StatusRuntimeExceptions from grpc which both need to be accounted for in the tests. Other than that, HDDS-1384 also causes bind exception to e thrown intermittently which in turn shuts down the miniOzoneCluster. To overcome this, the test class has been refactored as well. (was: The tests have become flaky bcoz once nodes are shutdown inn Ratis pipeline, a watch request can either be received at server at a server and fail with NotReplicatedException or soemtimes it fails with StatusRuntimeExceptions from grpc which both need to be accounted for in the tests. Other than that, HDDS-1384 also causes bind exception to e thrown intermittently which in turn shuts down the miniOzoneCluster. To overcome this, the test class has been refactored as well.) > TestWatchForCommit tests are flaky > -- > > Key: HDDS-1779 > URL: https://issues.apache.org/jira/browse/HDDS-1779 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The tests have become flaky bcoz once nodes are shutdown inn Ratis pipeline, > a watch request can either be received at server at a server and fail with > NotReplicatedException or sometimes it fails with StatusRuntimeExceptions > from grpc which both need to be accounted for in the tests. Other than that, > HDDS-1384 also causes bind exception to e thrown intermittently which in turn > shuts down the miniOzoneCluster. To overcome this, the test class has been > refactored as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky
[ https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1779: -- Description: The tests have become flaky bcoz once nodes are shutdown inn Ratis pipeline, a watch request can either be received at server at a server and fail with NotReplicatedException or soemtimes it fails with StatusRuntimeExceptions from grpc which both need to be accounted for in the tests. Other than that, HDDS-1384 also causes bind exception to e thrown intermittently which in turn shuts down the miniOzoneCluster. To overcome this, the test class has been refactored as well. > TestWatchForCommit tests are flaky > -- > > Key: HDDS-1779 > URL: https://issues.apache.org/jira/browse/HDDS-1779 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The tests have become flaky bcoz once nodes are shutdown inn Ratis pipeline, > a watch request can either be received at server at a server and fail with > NotReplicatedException or soemtimes it fails with StatusRuntimeExceptions > from grpc which both need to be accounted for in the tests. Other than that, > HDDS-1384 also causes bind exception to e thrown intermittently which in turn > shuts down the miniOzoneCluster. To overcome this, the test class has been > refactored as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky
[ https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1779: -- Status: Patch Available (was: Open) > TestWatchForCommit tests are flaky > -- > > Key: HDDS-1779 > URL: https://issues.apache.org/jira/browse/HDDS-1779 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky
[ https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1779: -- Target Version/s: 0.5.0 (was: 0.4.1) > TestWatchForCommit tests are flaky > -- > > Key: HDDS-1779 > URL: https://issues.apache.org/jira/browse/HDDS-1779 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.
[ https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1753: - Assignee: Shashikant Banerjee > Datanode unable to find chunk while replication data using ratis. > - > > Key: HDDS-1753 > URL: https://issues.apache.org/jira/browse/HDDS-1753 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > > Leader datanode is unable to read chunk from the datanode while replicating > data from leader to follower. > Please note that deletion of keys is also happening while the data is being > replicated. > {code} > 2019-07-02 19:39:22,604 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl > (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3 > -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048} > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 1) > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 2019-07-02 19:39:22,605 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace > ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c > hunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK > 2019-07-02 19:39:22,605 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot > (9770) already h > as the append entries (first index: 2) > 2019-07-02 19:39:22,606 INFO impl.RaftServerImpl > (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - > 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. > Reply:76a3eb0f-d7cd-477b-8973-db1 > 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782 > 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | > op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | > ret=FAILURE > java.lang.Exception: Unable to find the chunk file. chunk info > ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1, > offset=0, len=2048} > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495) > ~[hadoop-hdds-container-service-0.5.0-SN > APSHOT.jar:?] > at > com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) > ~[guava-11.0.2.jar:?] > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) > ~[guava-11.0.2.jar:?] > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) > ~[guava-11.0.2.jar:?] > at >
[jira] [Updated] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens
[ https://issues.apache.org/jira/browse/HDDS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1654: -- Status: Patch Available (was: Open) > Ensure container state on datanode gets synced to disk whenever state change > happens > > > Key: HDDS-1654 > URL: https://issues.apache.org/jira/browse/HDDS-1654 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, whenever there is a container state change, it updates the > container but doesn't sync. > The idea is here to is to force sync the state to disk everytime there is a > state change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1621) writeData in ChunkUtils should not use AsynchronousFileChannel
[ https://issues.apache.org/jira/browse/HDDS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1621. --- Resolution: Fixed Fix Version/s: 0.4.1 Thanks [~sdeka] for working on this. I have committed this change to trunk. > writeData in ChunkUtils should not use AsynchronousFileChannel > -- > > Key: HDDS-1621 > URL: https://issues.apache.org/jira/browse/HDDS-1621 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Shashikant Banerjee >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, chunks writes are not synced to disk by default. When > flushStateMachineData gests invoked from Ratis, it should also ensure all the > pending chunk writes should be flushed to disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens
[ https://issues.apache.org/jira/browse/HDDS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1654: -- Priority: Blocker (was: Major) > Ensure container state on datanode gets synced to disk whenever state change > happens > > > Key: HDDS-1654 > URL: https://issues.apache.org/jira/browse/HDDS-1654 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.5.0 > > > Currently, whenever there is a container state change, it updates the > container but doesn't sync. > The idea is here to is to force sync the state to disk everytime there is a > state change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens
[ https://issues.apache.org/jira/browse/HDDS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1654: -- Affects Version/s: 0.5.0 > Ensure container state on datanode gets synced to disk whenever state change > happens > > > Key: HDDS-1654 > URL: https://issues.apache.org/jira/browse/HDDS-1654 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Blocker > Fix For: 0.5.0 > > > Currently, whenever there is a container state change, it updates the > container but doesn't sync. > The idea is here to is to force sync the state to disk everytime there is a > state change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens
Shashikant Banerjee created HDDS-1654: - Summary: Ensure container state on datanode gets synced to disk whenever state change happens Key: HDDS-1654 URL: https://issues.apache.org/jira/browse/HDDS-1654 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, whenever there is a container state change, it updates the container but doesn't sync. The idea is here to is to force sync the state to disk everytime there is a state change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1621) flushStateMachineData should ensure the write chunks are flushed to disk
Shashikant Banerjee created HDDS-1621: - Summary: flushStateMachineData should ensure the write chunks are flushed to disk Key: HDDS-1621 URL: https://issues.apache.org/jira/browse/HDDS-1621 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Assignee: Supratim Deka Currently, chunks writes are not synced to disk by default. When flushStateMachineData gests invoked from Ratis, it should also ensure all the pending chunk writes should be flushed to disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852818#comment-16852818 ] Shashikant Banerjee commented on HDFS-14499: Thanks [~szetszwo]. Patch v1 addresses your review comments. > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14499: --- Attachment: HDFS-14499.001.patch > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1502) Add metrics for Ozone Ratis performance
[ https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1502: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Add metrics for Ozone Ratis performance > --- > > Key: HDDS-1502 > URL: https://issues.apache.org/jira/browse/HDDS-1502 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > This jira will add some metrics for Ratis pipeline performance > 1) number of bytes written > 2) number Read state Machine calls > 3) no of Read StateMachine Fails -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1614) Container Missing in the datanode after restart
[ https://issues.apache.org/jira/browse/HDDS-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1614: - Assignee: Shashikant Banerjee > Container Missing in the datanode after restart > --- > > Key: HDDS-1614 > URL: https://issues.apache.org/jira/browse/HDDS-1614 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > > Container missing on the datanode after a restart. > {code} > 08:10:44.308 [pool-2131-thread-1] ERROR DNAudit - user=null | ip=null | > op=WRITE_CHUNK {blockData=conID: 34 locID: 102182684750055212 bcsId: 6198} | > ret=FAILURE > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 34 has been lost and and cannot be recreated on this DataNode > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:207) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$0(ContainerStateMachine.java:385) > ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > [?:1.8.0_171] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_171] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_171] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1613) Read key fails with "Unable to find the block"
[ https://issues.apache.org/jira/browse/HDDS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1613: - Assignee: Shashikant Banerjee > Read key fails with "Unable to find the block" > -- > > Key: HDDS-1613 > URL: https://issues.apache.org/jira/browse/HDDS-1613 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > > Block read fails with > {code} > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > Unable to find the block with bcsID 11777 .Container 68 bcsId is 0. > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:573) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:120) > at > org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.initializeBlockInputStream(KeyInputStream.java:295) > at > org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.getStream(KeyInputStream.java:265) > at > org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.access$000(KeyInputStream.java:229) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.getStreamEntry(KeyInputStream.java:107) > at > org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:140) > at > org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:114) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Looking at the 3 datanodes, the containers are in bcs id of 11748, 11748 and > 0. > {code} > 2019-05-30 08:28:05,348 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace > ID: 93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block > with bcsID 11777 .Container 68 bcsId is 11748. : Result: UNKNOWN_BCSID > 2019-05-30 08:28:05,363 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace > ID: 93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block > with bcsID 11777 .Container 68 bcsId is 11748. : Result: UNKNOWN_BCSID > 2019-05-30 08:28:05,377 INFO keyvalue.KeyValueHandler > (ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace > ID: 93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block > with bcsID 11777 .Container 68 bcsId is 0. : Result: UNKNOWN_BCSID > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1610) ContainerStateMachine should not take snapshot if any of the applyTransactions fail
Shashikant Banerjee created HDDS-1610: - Summary: ContainerStateMachine should not take snapshot if any of the applyTransactions fail Key: HDDS-1610 URL: https://issues.apache.org/jira/browse/HDDS-1610 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee If the applyTransaction fails in the containerStateMachine, all the subsequent snapshots should be disallowed. As in case. it restarts, it should always reapply from the last successful transactio committed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14499: --- Attachment: HDFS-14499.000.patch > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14499: --- Status: Patch Available (was: Open) > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDFS-14499: -- Assignee: Shashikant Banerjee > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1509) TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently
[ https://issues.apache.org/jira/browse/HDDS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1509: -- Resolution: Fixed Status: Resolved (was: Patch Available) > TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently > > > Key: HDDS-1509 > URL: https://issues.apache.org/jira/browse/HDDS-1509 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The test fails because, the test expects a exception after 2 datanodes > failures to be of type RaftRetryFailureException. But it might happen that, > the pipeline gets destroyed quickly then actual write executes over Ratis, > hence it will fail with GroupMismatchhException in such case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1584) Fix TestFailureHandlingByClient tests
[ https://issues.apache.org/jira/browse/HDDS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1584: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Fix TestFailureHandlingByClient tests > - > > Key: HDDS-1584 > URL: https://issues.apache.org/jira/browse/HDDS-1584 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.1 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 1h > Remaining Estimate: 0h > > The test failures are caused bcoz the test relies on > KeyoutputStream#getLocationList() to validate the no of preallocated blocks, > but it has been changed recently to exclude the empty blocks. The fix is > mostly to use KeyOutputStream#getStreamEntries() to get the no of > preallocated blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1558) IllegalArgumentException while processing container Reports
[ https://issues.apache.org/jira/browse/HDDS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1558: -- Status: Patch Available (was: Open) > IllegalArgumentException while processing container Reports > --- > > Key: HDDS-1558 > URL: https://issues.apache.org/jira/browse/HDDS-1558 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > IllegalArgumentException while processing container Reports > {code} > 2019-05-19 23:15:04,137 ERROR events.SingleThreadExecutor > (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution > message > org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@1a117ebc > java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:178) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:124) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1558) IllegalArgumentException while processing container Reports
[ https://issues.apache.org/jira/browse/HDDS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848793#comment-16848793 ] Shashikant Banerjee commented on HDDS-1558: --- The issue seems to be happening because of the following sequence: # 2 out of 3 container replica marked unhealthy, but 1 replica keeps on getting and applying transaction successfully updating its BCSID. # SCM gets updated of the latest BCSID by the healthy replica. # one unhealty and one healthy node gets restarted and join the ring , close container command issued from SCM gets executred via Ratis on unhealthy replica as the after restart , the unnhealthy state is not persisted. # When the BCSID reported by replica report of this replica, it would hit the exception as it would have a lower BCSID than what SCM already has. > IllegalArgumentException while processing container Reports > --- > > Key: HDDS-1558 > URL: https://issues.apache.org/jira/browse/HDDS-1558 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > IllegalArgumentException while processing container Reports > {code} > 2019-05-19 23:15:04,137 ERROR events.SingleThreadExecutor > (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution > message > org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@1a117ebc > java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:178) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:124) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1558) IllegalArgumentException while processing container Reports
[ https://issues.apache.org/jira/browse/HDDS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1558: - Assignee: Shashikant Banerjee > IllegalArgumentException while processing container Reports > --- > > Key: HDDS-1558 > URL: https://issues.apache.org/jira/browse/HDDS-1558 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > > IllegalArgumentException while processing container Reports > {code} > 2019-05-19 23:15:04,137 ERROR events.SingleThreadExecutor > (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution > message > org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@1a117ebc > java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:178) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:124) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1584) Fix TestFailureHandlingByClient tests
[ https://issues.apache.org/jira/browse/HDDS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1584: -- Description: The test failures are caused bcoz the test relies on KeyoutputStream#getLocationList() to validate the no of preallocated blocks, but it has been changed recently to exclude the empty blocks. The fix is mostly to use KeyOutputStream#getStreamEntries() to get the no of preallocated blocks. > Fix TestFailureHandlingByClient tests > - > > Key: HDDS-1584 > URL: https://issues.apache.org/jira/browse/HDDS-1584 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.1 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The test failures are caused bcoz the test relies on > KeyoutputStream#getLocationList() to validate the no of preallocated blocks, > but it has been changed recently to exclude the empty blocks. The fix is > mostly to use KeyOutputStream#getStreamEntries() to get the no of > preallocated blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1589) CloseContainer transaction on unhealthy replica should fail with CONTAINER_UNHEALTHY exception
[ https://issues.apache.org/jira/browse/HDDS-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1589: -- Description: Currently, while trying to close an unhealthy container over Ratis, it fails with INTERNAL_ERROR which leads to exception as follow: {code:java} 2019-05-19 22:00:48,386 ERROR commandhandler.CloseContainerCommandHandler (CloseContainerCommandHandler.java:handle(124)) - Can't close container #125 org.apache.ratis.protocol.StateMachineException: java.util.concurrent.CompletionException from Server faea26b0-9c60-4b4c-a0df-bf7c67cc5b48: java.lang.IllegalStateException at org.apache.ratis.server.impl.RaftServerImpl.lambda$replyPendingRequest$24(RaftServerImpl.java:1221) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.CompletionException: java.lang.IllegalStateException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592) ... 3 more Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:300) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$5(ContainerStateMachine.java:613) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) {code} This happens when , it tries to mark the container unhealthy as the transaction has failed and tries to mark the container unhealthy where it expects the container to be in OPEN or CLOSIG state ad hence asserts. It should ideally fail with CONTAINER_UNHEALTHY so as to not retry to not change the state to be UNHEALTHY. was: Currently, while trying to close an unhealthy container over Ratis, it fails with INTERNAL_ERROR which leads to exception as follow: {code:java} 2019-05-19 22:00:48,386 ERROR commandhandler.CloseContainerCommandHandler (CloseContainerCommandHandler.java:handle(124)) - Can't close container #125 org.apache.ratis.protocol.StateMachineException: java.util.concurrent.CompletionException from Server faea26b0-9c60-4b4c-a0df-bf7c67cc5b48: java.lang.IllegalStateException at org.apache.ratis.server.impl.RaftServerImpl.lambda$replyPendingRequest$24(RaftServerImpl.java:1221) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.CompletionException: java.lang.IllegalStateException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592) ... 3 more Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:300) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
[jira] [Created] (HDDS-1589) CloseContainer transaction on unhealthy replica should fail with CONTAINER_UNHEALTHY exception
Shashikant Banerjee created HDDS-1589: - Summary: CloseContainer transaction on unhealthy replica should fail with CONTAINER_UNHEALTHY exception Key: HDDS-1589 URL: https://issues.apache.org/jira/browse/HDDS-1589 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Currently, while trying to close an unhealthy container over Ratis, it fails with INTERNAL_ERROR which leads to exception as follow: {code:java} 2019-05-19 22:00:48,386 ERROR commandhandler.CloseContainerCommandHandler (CloseContainerCommandHandler.java:handle(124)) - Can't close container #125 org.apache.ratis.protocol.StateMachineException: java.util.concurrent.CompletionException from Server faea26b0-9c60-4b4c-a0df-bf7c67cc5b48: java.lang.IllegalStateException at org.apache.ratis.server.impl.RaftServerImpl.lambda$replyPendingRequest$24(RaftServerImpl.java:1221) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.CompletionException: java.lang.IllegalStateException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592) ... 3 more Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:300) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$5(ContainerStateMachine.java:613) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) {code} This happens when , it tries to mark the container unhealthy as the transaction has failed and tries to mark the container unhealthy where it expects the container to be in OPE or CLOSIG state ad hence asserts. It should ideally fail with CONTAINER_UNHEATHY so as to not retry to not change the state to be UNNHEATHY. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1584) Fix TestFailureHandlingByClient tests
[ https://issues.apache.org/jira/browse/HDDS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1584: -- Status: Patch Available (was: Open) > Fix TestFailureHandlingByClient tests > - > > Key: HDDS-1584 > URL: https://issues.apache.org/jira/browse/HDDS-1584 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.1 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1584) Fix TestFailureHandlingByClient tests
Shashikant Banerjee created HDDS-1584: - Summary: Fix TestFailureHandlingByClient tests Key: HDDS-1584 URL: https://issues.apache.org/jira/browse/HDDS-1584 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.4.1 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.4.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key
[ https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1449: -- Fix Version/s: (was: 0.5.0) 0.4.1 > JVM Exit in datanode while committing a key > --- > > Key: HDDS-1449 > URL: https://issues.apache.org/jira/browse/HDDS-1449 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.4.1 > > Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, > hs_err_pid67466.log > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Saw the following trace in MiniOzoneChaosCluster run. > {code} > C [librocksdbjni17271331491728127.jnilib+0x9755c] > Java_org_rocksdb_RocksDB_write0+0x1c > J 13917 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e > [0x0001102ff580+0xae] > J 17167 C2 > org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V > (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c] > J 20434 C1 > org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J > (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c] > J 19262 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540] > J 15095 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880] > J 19301 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4] > J 15997 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object; > (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4] > J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 > bytes) @ 0x00010fc80094 [0x00010fc8+0x94] > J 17368 C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200] > J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ > 0x00011012a004 [0x000110129f00+0x104] > J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 > [0x00011002b000+0x144] > v ~StubRoutines::call_stub > V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*)+0x6ae > V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164 > V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x4a > V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c > V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b > V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2 > V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6 > C [libsystem_pthread.dylib+0x3305] _pthread_body+0x7e > C [libsystem_pthread.dylib+0x626f] _pthread_start+0x46 > C [libsystem_pthread.dylib+0x2415] thread_start+0xd > C 0x > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845808#comment-16845808 ] Shashikant Banerjee commented on HDDS-1517: --- Thanks [~jnp] for the review. I have committed this change to trunk. > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Attachments: HDDS-1517.000.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1517: -- Resolution: Fixed Target Version/s: 0.4.1 (was: 0.5.0) Status: Resolved (was: Patch Available) > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Attachments: HDDS-1517.000.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key
[ https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1449: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) Thanks [~msingh] for working in this. I have committed this change to trunk. > JVM Exit in datanode while committing a key > --- > > Key: HDDS-1449 > URL: https://issues.apache.org/jira/browse/HDDS-1449 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.5.0 > > Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, > hs_err_pid67466.log > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Saw the following trace in MiniOzoneChaosCluster run. > {code} > C [librocksdbjni17271331491728127.jnilib+0x9755c] > Java_org_rocksdb_RocksDB_write0+0x1c > J 13917 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e > [0x0001102ff580+0xae] > J 17167 C2 > org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V > (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c] > J 20434 C1 > org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J > (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c] > J 19262 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540] > J 15095 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880] > J 19301 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4] > J 15997 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object; > (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4] > J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 > bytes) @ 0x00010fc80094 [0x00010fc8+0x94] > J 17368 C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200] > J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ > 0x00011012a004 [0x000110129f00+0x104] > J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 > [0x00011002b000+0x144] > v ~StubRoutines::call_stub > V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*)+0x6ae > V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164 > V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x4a > V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c > V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b > V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2 > V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6 > C [libsystem_pthread.dylib+0x3305] _pthread_body+0x7e > C [libsystem_pthread.dylib+0x626f] _pthread_start+0x46 > C [libsystem_pthread.dylib+0x2415] thread_start+0xd > C 0x > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1497) Refactor blockade Tests
[ https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844870#comment-16844870 ] Shashikant Banerjee commented on HDDS-1497: --- Thanks [~nilotpalnandi] for working on this. Some comments inline: 1. Please update comments for property, getter and setter functions. 2.cluster.py:223-224 : > incorrect comments. 3. clusterUtils.py:324 -> "om_1" should be "om"? 4.cluster_utils.py:296 -> which file checksum is it supposed to compute ? can you please update the comments? > Refactor blockade Tests > --- > > Key: HDDS-1497 > URL: https://issues.apache.org/jira/browse/HDDS-1497 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Attachments: HDDS-1497.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14505) "touchz" command should check quota limit before deleting an already existing file
Shashikant Banerjee created HDFS-14505: -- Summary: "touchz" command should check quota limit before deleting an already existing file Key: HDFS-14505 URL: https://issues.apache.org/jira/browse/HDFS-14505 Project: Hadoop HDFS Issue Type: Bug Reporter: Shashikant Banerjee {code:java} HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:14:01,080 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file4 HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file4 2019-05-21 15:14:12,247 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=5 HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:14:20,607 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable {code} Here, the "touchz" command failed to create the file as the quota limit was hit, but ended up deleting the original file which existed. It should do the quota check before deleting the file so that after successful deletion, creation should succeed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14504: --- Description: Steps to Reproduce: {code:java} HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Allowing snapshot on /dir2 succeeded HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap1 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=4 HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap2 HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=5 {code} // create operation fails here as it has already exceeded the quota limit {code} HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap3 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable {code} // Rename operation succeeds here adding on to the namespace quota {code} HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=6 {code} // File creation fails here but file count has been increased to 6, bcoz of the previous rename operation{code} The quota being set here is 3. Each successive rename adds an entry to the deleted list of the snapshot diff which gets accounted in the namespace quota, but the rename operation is allowed even when it exceeds the quota limit with snapshots. Once, an attempt is made to create a file, it fails. was: Steps to Reproduce: {code:java} HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes
[jira] [Updated] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-14504: --- Description: Steps to Reproduce: {code:java} HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Allowing snapshot on /dir2 succeeded HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap1 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=4 HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap2 HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=5 // create operation fails here as it has already exceeded the quota limit HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap3 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable // Rename operation succeeds here adding on to the namespace quota HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=6 // File creation fails here but file count has been increased to 6, bcoz of the previous rename operation{code} The quota being set here is 3. Each successive rename adds an entry to the deleted list of the snapshot diff which gets accounted in the namespace quota, but the rename operation is allowed even when it exceeds the quota limit with snapshots. Once, an attempt is made to create a file, it fails. was: Steps to Reproduce: {code:java} HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin
[jira] [Created] (HDFS-14504) Rename with Snapshots does not honor quota limit
Shashikant Banerjee created HDFS-14504: -- Summary: Rename with Snapshots does not honor quota limit Key: HDFS-14504 URL: https://issues.apache.org/jira/browse/HDFS-14504 Project: Hadoop HDFS Issue Type: Bug Reporter: Shashikant Banerjee Steps to Reproduce: {code:java} HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Allowing snapshot on /dir2 succeeded HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap1 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=4 HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap2 HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=5 // create operation fails here as it has already exceeded the quota limit HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /dir2/.snapshot/snap3 HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable // Rename operation succeeds here adding on to the namespace quota HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable touchz: The NameSpace quota (directories and files) of directory /dir2 is exceeded: quota=3 file count=6 // Fie creation fails here but file count has been increased to 6, bcoz of the previous rename operation{code} The quota being set here is 3. Each successive rename adds an entry to the deleted list of the snapshot diff which gets accounted in the namespace quota, but the rename operation is allowed even when it exceeds the quota limit with snapshots. Once, an attempt is made to create a file, it fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Updated] (HDDS-1502) Add metrics for Ozone Ratis performance
[ https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1502: -- Description: This jira will add some metrics for Ratis pipeline performance 1) number of bytes written 2) number Read state Machine calls 3) no of Read StateMachine Fails was: This jira will add some metrics for Ratis pipeline performance a) number of chunks written per seconds b) number of bytes written per second c) number of chunk/bytes missed during read State Machine data. > Add metrics for Ozone Ratis performance > --- > > Key: HDDS-1502 > URL: https://issues.apache.org/jira/browse/HDDS-1502 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This jira will add some metrics for Ratis pipeline performance > 1) number of bytes written > 2) number Read state Machine calls > 3) no of Read StateMachine Fails -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1502) Add metrics for Ozone Ratis performance
[ https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1502: -- Status: Patch Available (was: Open) > Add metrics for Ozone Ratis performance > --- > > Key: HDDS-1502 > URL: https://issues.apache.org/jira/browse/HDDS-1502 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This jira will add some metrics for Ratis pipeline performance > 1) number of bytes written > 2) number Read state Machine calls > 3) no of Read StateMachine Fails -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842325#comment-16842325 ] Shashikant Banerjee commented on HDDS-1517: --- Thanks [~jnp], as discussed i have updated the patch in the pull request. > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Attachments: HDDS-1517.000.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
Shashikant Banerjee created HDFS-14499: -- Summary: Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory Key: HDFS-14499 URL: https://issues.apache.org/jira/browse/HDFS-14499 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Reporter: Shashikant Banerjee This is the flow of steps where we see a discrepancy between REM_QUOTA and new file operation failure. REM_QUOTA shows a value of 1 but file creation operation does not succeed. {code:java} hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 Allowing snaphot on /dir1 succeeded hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 Created snapshot /dir1/.snapshot/snap1 hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 2 0 none inf 1 1 0 /dir1 hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 'hdfs://smajetinn/dir1/file1' to trash at: hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 2 1 none inf 1 0 0 /dir1 hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 touchz: The NameSpace quota (directories and files) of directory /dir1 is exceeded: quota=2 file count=3{code} The issue here, is that the count command takes only files and directories into account not the inode references. When trash is enabled, the deletion of files inside a directory actually does a rename operation as a result of which an inode reference is maintained in the deleted list of the snapshot diff which is taken into account while computing the namespace quota, but count command (getContentSummary()) ,just takes into account just the files and directories, not the referenced entity for calculating the REM_QUOTA. The referenced entity is taken into account for space quota only. InodeReference.java: --- {code:java} @Override public final ContentSummaryComputationContext computeContentSummary( int snapshotId, ContentSummaryComputationContext summary) { final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; // only count storagespace for WithName final QuotaCounts q = computeQuotaUsage( summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, s); summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); summary.getCounts().addTypeSpaces(q.getTypeSpaces()); return summary; } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode
[ https://issues.apache.org/jira/browse/HDDS-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1531. --- Resolution: Fixed > Disable the sync flag by default during chunk writes in Datanode > > > Key: HDDS-1531 > URL: https://issues.apache.org/jira/browse/HDDS-1531 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, by default while doing the chunk writes on datanodes, the sync > flag is ON by default. This needs to be turned off by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1509) TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently
[ https://issues.apache.org/jira/browse/HDDS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1509: -- Status: Patch Available (was: Open) > TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently > > > Key: HDDS-1509 > URL: https://issues.apache.org/jira/browse/HDDS-1509 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The test fails because, the test expects a exception after 2 datanodes > failures to be of type RaftRetryFailureException. But it might happen that, > the pipeline gets destroyed quickly then actual write executes over Ratis, > hence it will fail with GroupMismatchhException in such case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work stopped] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode
[ https://issues.apache.org/jira/browse/HDDS-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-1531 stopped by Shashikant Banerjee. - > Disable the sync flag by default during chunk writes in Datanode > > > Key: HDDS-1531 > URL: https://issues.apache.org/jira/browse/HDDS-1531 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, by default while doing the chunk writes on datanodes, the sync > flag is ON by default. This needs to be turned off by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode
[ https://issues.apache.org/jira/browse/HDDS-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-1531 started by Shashikant Banerjee. - > Disable the sync flag by default during chunk writes in Datanode > > > Key: HDDS-1531 > URL: https://issues.apache.org/jira/browse/HDDS-1531 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, by default while doing the chunk writes on datanodes, the sync > flag is ON by default. This needs to be turned off by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode
Shashikant Banerjee created HDDS-1531: - Summary: Disable the sync flag by default during chunk writes in Datanode Key: HDDS-1531 URL: https://issues.apache.org/jira/browse/HDDS-1531 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, by default while doing the chunk writes on datanodes, the sync flag is ON by default. This needs to be turned off by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840044#comment-16840044 ] Shashikant Banerjee commented on HDDS-1517: --- Patch v0 adds the fix. I will open up a pull request and add a new patch which also will add test to verify the fix. > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1517.000.patch > > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1517: -- Status: Patch Available (was: Open) > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1517.000.patch > > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1517: -- Attachment: HDDS-1517.000.patch > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1517.000.patch > > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1517: -- Description: In allocateContainer call, the container is first added to pipelineStateMap and then added to container cache. If two allocate blocks execute concurrently, it might happen that one find the container to exist in the pipelineStateMap but the container is yet to be updated in the container cache, hence failing with CONTAINER_NOT_FOUND exception. > AllocateBlock call fails with ContainerNotFoundException > > > Key: HDDS-1517 > URL: https://issues.apache.org/jira/browse/HDDS-1517 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > In allocateContainer call, the container is first added to pipelineStateMap > and then added to container cache. If two allocate blocks execute > concurrently, it might happen that one find the container to exist in the > pipelineStateMap but the container is yet to be updated in the container > cache, hence failing with CONTAINER_NOT_FOUND exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException
Shashikant Banerjee created HDDS-1517: - Summary: AllocateBlock call fails with ContainerNotFoundException Key: HDDS-1517 URL: https://issues.apache.org/jira/browse/HDDS-1517 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1509) TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently
Shashikant Banerjee created HDDS-1509: - Summary: TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently Key: HDDS-1509 URL: https://issues.apache.org/jira/browse/HDDS-1509 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 The test fails because, the test expects a exception after 2 datanodes failures to be of type RaftRetryFailureException. But it might happen that, the pipeline gets destroyed quickly then actual write executes over Ratis, hence it will fail with GroupMismatchhException in such case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1502) Add metrics for Ozone Ratis performance
[ https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1502: - Assignee: Shashikant Banerjee (was: Mukul Kumar Singh) > Add metrics for Ozone Ratis performance > --- > > Key: HDDS-1502 > URL: https://issues.apache.org/jira/browse/HDDS-1502 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > > This jira will add some metrics for Ratis pipeline performance > a) number of chunks written per seconds > b) number of bytes written per second > c) number of chunk/bytes missed during read State Machine data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1504) Watch Request should use retry policy with higher timeouts for RaftClient
Shashikant Banerjee created HDDS-1504: - Summary: Watch Request should use retry policy with higher timeouts for RaftClient Key: HDDS-1504 URL: https://issues.apache.org/jira/browse/HDDS-1504 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, Raft Client request times out with default of 3s but, watch request can have longer timeouts as some followers can be really slow. It would be good to enforce a retry policy with higher timeouts while submitting watch request over raft client in ozone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1362) Append all chunk writes for a block to a single file in datanode
[ https://issues.apache.org/jira/browse/HDDS-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1362. --- Resolution: Duplicate > Append all chunk writes for a block to a single file in datanode > > > Key: HDDS-1362 > URL: https://issues.apache.org/jira/browse/HDDS-1362 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, for each chunk, data is written to individual chunk files. The > idea here is to maintain one file per block in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1437) TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails with assertion error
[ https://issues.apache.org/jira/browse/HDDS-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1437. --- Resolution: Fixed Fix Version/s: 0.5.0 Target Version/s: 0.5.0 (was: 0.4.0) This should have been addressed with HDDS-1395. Its not reproducible in latest runs . Resolving it for now. > TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails > with assertion error > -- > > Key: HDDS-1437 > URL: https://issues.apache.org/jira/browse/HDDS-1437 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > The test is failing with the following assertion > {code} > java.lang.AssertionError: expected:<2> but was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:373) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > https://ci.anzix.net//job/ozone-nightly/62//testReport/junit/org.apache.hadoop.ozone.client.rpc/TestBlockOutputStreamWithFailures/testWatchForCommitDatanodeFailure/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1485) Ozone writes fail when single threaded client writes 100MB files repeatedly.
[ https://issues.apache.org/jira/browse/HDDS-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1485: - Assignee: Shashikant Banerjee > Ozone writes fail when single threaded client writes 100MB files repeatedly. > - > > Key: HDDS-1485 > URL: https://issues.apache.org/jira/browse/HDDS-1485 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Aravindan Vijayan >Assignee: Shashikant Banerjee >Priority: Blocker > > *Environment* > 26 node physical cluster. > All Datanodes are up and running. > Client attempting to write 1600 x 100MB files using the FsStress utility > (https://github.com/arp7/FsPerfTest) fails with the following error. > {code} > 19/05/02 09:58:49 ERROR storage.BlockOutputStream: Unexpected Storage > Container Exception: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 424 does not exist > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:573) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:539) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$2(BlockOutputStream.java:616) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It looks like a corruption in the container metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException
[ https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1436. --- Resolution: Fixed Fix Version/s: 0.5.0 > TestCommitWatcher#testReleaseBuffersOnException fails with > IllegalStateException > > > Key: HDDS-1436 > URL: https://issues.apache.org/jira/browse/HDDS-1436 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone-flaky-test > Fix For: 0.5.0 > > > the test is failing with the following exception > {code} > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191) > at > org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception
[ https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1395: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) Thanks [~jnp] for the review. I have committed this change to trunk. > Key write fails with BlockOutputStream has been closed exception > > > Key: HDDS-1395 > URL: https://issues.apache.org/jira/browse/HDDS-1395 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, > HDDS-1395.003.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Key write fails with BlockOutputStream has been closed > {code} > 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator > (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create > key:pool-431-thread-9-2092651262 failed with exception, but skipping > java.io.IOException: BlockOutputStream has been closed. > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1224) Restructure code to validate the response from server in the Read path
[ https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834575#comment-16834575 ] Shashikant Banerjee commented on HDDS-1224: --- Attached v0 patch for initial review. Will generate a pull request soon. > Restructure code to validate the response from server in the Read path > -- > > Key: HDDS-1224 > URL: https://issues.apache.org/jira/browse/HDDS-1224 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1224.000.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In the read path, the validation of the response while reading the data from > the datanodes happen in XceiverClientGrpc as well as additional Checksum > verification happens in Ozone client to verify the read chunk response. The > aim of this Jira is to modify the function call to take a validator function > as a part of reading data so all validation can happen in a single unified > place. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path
[ https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1224: -- Attachment: HDDS-1224.000.patch > Restructure code to validate the response from server in the Read path > -- > > Key: HDDS-1224 > URL: https://issues.apache.org/jira/browse/HDDS-1224 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.0 > > Attachments: HDDS-1224.000.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In the read path, the validation of the response while reading the data from > the datanodes happen in XceiverClientGrpc as well as additional Checksum > verification happens in Ozone client to verify the read chunk response. The > aim of this Jira is to modify the function call to take a validator function > as a part of reading data so all validation can happen in a single unified > place. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path
[ https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1224: -- Status: Patch Available (was: Open) > Restructure code to validate the response from server in the Read path > -- > > Key: HDDS-1224 > URL: https://issues.apache.org/jira/browse/HDDS-1224 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.4.0 > > Attachments: HDDS-1224.000.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In the read path, the validation of the response while reading the data from > the datanodes happen in XceiverClientGrpc as well as additional Checksum > verification happens in Ozone client to verify the read chunk response. The > aim of this Jira is to modify the function call to take a validator function > as a part of reading data so all validation can happen in a single unified > place. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path
[ https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1224: -- Fix Version/s: (was: 0.4.0) 0.5.0 > Restructure code to validate the response from server in the Read path > -- > > Key: HDDS-1224 > URL: https://issues.apache.org/jira/browse/HDDS-1224 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1224.000.patch > > Time Spent: 20m > Remaining Estimate: 0h > > In the read path, the validation of the response while reading the data from > the datanodes happen in XceiverClientGrpc as well as additional Checksum > verification happens in Ozone client to verify the read chunk response. The > aim of this Jira is to modify the function call to take a validator function > as a part of reading data so all validation can happen in a single unified > place. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1437) TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails with assertion error
[ https://issues.apache.org/jira/browse/HDDS-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1437: - Assignee: Shashikant Banerjee > TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails > with assertion error > -- > > Key: HDDS-1437 > URL: https://issues.apache.org/jira/browse/HDDS-1437 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > > The test is failing with the following assertion > {code} > java.lang.AssertionError: expected:<2> but was:<3> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:373) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > https://ci.anzix.net//job/ozone-nightly/62//testReport/junit/org.apache.hadoop.ozone.client.rpc/TestBlockOutputStreamWithFailures/testWatchForCommitDatanodeFailure/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception
[ https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1395: -- Status: Open (was: Patch Available) > Key write fails with BlockOutputStream has been closed exception > > > Key: HDDS-1395 > URL: https://issues.apache.org/jira/browse/HDDS-1395 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, > HDDS-1395.003.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Key write fails with BlockOutputStream has been closed > {code} > 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator > (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create > key:pool-431-thread-9-2092651262 failed with exception, but skipping > java.io.IOException: BlockOutputStream has been closed. > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception
[ https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1395: -- Status: Patch Available (was: Open) > Key write fails with BlockOutputStream has been closed exception > > > Key: HDDS-1395 > URL: https://issues.apache.org/jira/browse/HDDS-1395 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, > HDDS-1395.003.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Key write fails with BlockOutputStream has been closed > {code} > 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator > (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create > key:pool-431-thread-9-2092651262 failed with exception, but skipping > java.io.IOException: BlockOutputStream has been closed. > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception
[ https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1395: -- Attachment: HDDS-1395.003.patch > Key write fails with BlockOutputStream has been closed exception > > > Key: HDDS-1395 > URL: https://issues.apache.org/jira/browse/HDDS-1395 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, > HDDS-1395.003.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Key write fails with BlockOutputStream has been closed > {code} > 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator > (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create > key:pool-431-thread-9-2092651262 failed with exception, but skipping > java.io.IOException: BlockOutputStream has been closed. > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1484) Add unit tests for writing concurrently on different type of pipelines by multiple threads
Shashikant Banerjee created HDDS-1484: - Summary: Add unit tests for writing concurrently on different type of pipelines by multiple threads Key: HDDS-1484 URL: https://issues.apache.org/jira/browse/HDDS-1484 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Shashikant Banerjee This Jira aims to add unit tests for writing concurrently in single as well as 3 node pipelines with different sized data using multiple threads -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit
[ https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-1282. --- Resolution: Fixed Fix Version/s: 0.5.0 As [~elek] explained, this issue does not exist any more and the other issue is tracked by HDDS-1384. Resolving this. > TestFailureHandlingByClient causes a jvm exit > - > > Key: HDDS-1282 > URL: https://issues.apache.org/jira/browse/HDDS-1282 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1282.001.patch, > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient-output.txt > > > The test causes jvm exit because the test exits prematurely. > {code} > [ERROR] org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient > [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd > /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test && > /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java > -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar > /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire/surefirebooter5405606309417840457.jar > > /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire > 2019-03-13T23-31-09_018-jvmRun1 surefire5934599060460829594tmp > surefire_1202723709650989744795tmp > [ERROR] Error occurred in starting fork, check output in log > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1452: -- Summary: All chunk writes should happen to a single file for a block in datanode (was: All chunks should happen to a single file for a block in datanode) > All chunk writes should happen to a single file for a block in datanode > --- > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1449) JVM Exit in datanode while committing a key
[ https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1449: - Assignee: Shashikant Banerjee > JVM Exit in datanode while committing a key > --- > > Key: HDDS-1449 > URL: https://issues.apache.org/jira/browse/HDDS-1449 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, > hs_err_pid67466.log > > > Saw the following trace in MiniOzoneChaosCluster run. > {code} > C [librocksdbjni17271331491728127.jnilib+0x9755c] > Java_org_rocksdb_RocksDB_write0+0x1c > J 13917 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e > [0x0001102ff580+0xae] > J 17167 C2 > org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V > (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c] > J 20434 C1 > org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J > (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c] > J 19262 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540] > J 15095 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880] > J 19301 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4] > J 15997 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object; > (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4] > J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 > bytes) @ 0x00010fc80094 [0x00010fc8+0x94] > J 17368 C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200] > J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ > 0x00011012a004 [0x000110129f00+0x104] > J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 > [0x00011002b000+0x144] > v ~StubRoutines::call_stub > V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*)+0x6ae > V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164 > V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x4a > V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c > V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b > V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2 > V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6 > C [libsystem_pthread.dylib+0x3305] _pthread_body+0x7e > C [libsystem_pthread.dylib+0x626f] _pthread_start+0x46 > C [libsystem_pthread.dylib+0x2415] thread_start+0xd > C 0x > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1452) All chunks should happen to a single file for a block in datanode
Shashikant Banerjee created HDDS-1452: - Summary: All chunks should happen to a single file for a block in datanode Key: HDDS-1452 URL: https://issues.apache.org/jira/browse/HDDS-1452 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, all chunks of a block happen to individual chunk files in datanode. This idea here is to write all individual chunks to a single file in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1445) Add handling of NotReplicatedException in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819910#comment-16819910 ] Shashikant Banerjee commented on HDDS-1445: --- This will be handled as a part of HDDS-1395. > Add handling of NotReplicatedException in OzoneClient > - > > Key: HDDS-1445 > URL: https://issues.apache.org/jira/browse/HDDS-1445 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > > In MiniOzoneChaosCluster some of the calls fail with NotReplicatedException. > This Exception needs to be handled in OzoneClient > {code} > 2019-04-17 10:13:47,254 INFO client.GrpcClientProtocolService > (GrpcClientProtocolService.java:lambda$processClientRequest$0(264)) - Failed > RaftClientRequest:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC, > cid=800, seq=0, Watch-ALL_COMMITTED(234), Message:, > reply=RaftClientReply:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC, > cid=800, FAILED org.apache.ratis.protocol.NotReplicatedException: Request > with call Id 800 and log index 234 is not yet replicated to ALL_COMMITTED, > logIndex=234, commits[1ebec547-8cf8-4466-bf43-ea9f19fb546b:c267, > 7b200ef5-7711-437d-a9bc-ad0e18fdf6bb:c267, > ffbfb65f-a622-466d-b6e8-47038cc15e0b:c226] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception
[ https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1395: -- Summary: Key write fails with BlockOutputStream has been closed exception (was: Key write fails with "BlockOutputStream has been closed") > Key write fails with BlockOutputStream has been closed exception > > > Key: HDDS-1395 > URL: https://issues.apache.org/jira/browse/HDDS-1395 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch > > > Key write fails with BlockOutputStream has been closed > {code} > 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator > (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create > key:pool-431-thread-9-2092651262 failed with exception, but skipping > java.io.IOException: BlockOutputStream has been closed. > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1445) Add handling of NotReplicatedException in OzoneClient
[ https://issues.apache.org/jira/browse/HDDS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1445: - Assignee: Shashikant Banerjee > Add handling of NotReplicatedException in OzoneClient > - > > Key: HDDS-1445 > URL: https://issues.apache.org/jira/browse/HDDS-1445 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > > In MiniOzoneChaosCluster some of the calls fail with NotReplicatedException. > This Exception needs to be handled in OzoneClient > {code} > 2019-04-17 10:13:47,254 INFO client.GrpcClientProtocolService > (GrpcClientProtocolService.java:lambda$processClientRequest$0(264)) - Failed > RaftClientRequest:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC, > cid=800, seq=0, Watch-ALL_COMMITTED(234), Message:, > reply=RaftClientReply:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC, > cid=800, FAILED org.apache.ratis.protocol.NotReplicatedException: Request > with call Id 800 and log index 234 is not yet replicated to ALL_COMMITTED, > logIndex=234, commits[1ebec547-8cf8-4466-bf43-ea9f19fb546b:c267, > 7b200ef5-7711-437d-a9bc-ad0e18fdf6bb:c267, > ffbfb65f-a622-466d-b6e8-47038cc15e0b:c226] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1373) KeyOutputStream, close after write request fails after retries, runs into IllegalArgumentException
[ https://issues.apache.org/jira/browse/HDDS-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1373: -- Resolution: Fixed Fix Version/s: 0.5.0 Status: Resolved (was: Patch Available) Thanks [~msingh] and [~jnp] for the review. I have committed this change to trunk. > KeyOutputStream, close after write request fails after retries, runs into > IllegalArgumentException > -- > > Key: HDDS-1373 > URL: https://issues.apache.org/jira/browse/HDDS-1373 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.4.0, 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1373.000.patch, HDDS-1373.001.patch > > Time Spent: 40m > Remaining Estimate: 0h > > In this code, the stream is closed via try with resource. > {code} > try (OzoneOutputStream stream = ozoneBucket.createKey(keyName, > bufferCapacity, ReplicationType.RATIS, ReplicationFactor.THREE, > new HashMap<>())) { > stream.write(buffer.array()); > } catch (Exception e) { > LOG.error("LOADGEN: Create key:{} failed with exception", keyName, e); > break; > } > {code} > Here, the write call fails correctly as expected, However the close doesn't > fail with the same exception. > The exception stack stack is as following > {code} > 2019-04-03 00:52:54,116 ERROR ozone.MiniOzoneLoadGenerator > (MiniOzoneLoadGenerator.java:load(101)) - LOADGEN: Create > key:pool-431-thread-9-8126 failed with exception > java.io.IOException: Retry request failed. retries get failed due to exceeded > maximum allowed retries number: 5 > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:492) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at java.io.OutputStream.write(OutputStream.java:75) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:99) > at > org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:137) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Suppressed: java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at >
[jira] [Updated] (HDDS-1380) Add functonality to write from multiple clients in MiniOzoneChaosCluster
[ https://issues.apache.org/jira/browse/HDDS-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-1380: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~msingh] for the review. i have committed this change to trunk. > Add functonality to write from multiple clients in MiniOzoneChaosCluster > > > Key: HDDS-1380 > URL: https://issues.apache.org/jira/browse/HDDS-1380 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-1380.000.patch > > > Currently, MiniOzoneChaosCluster writes multiple keys in parallel using only > one OzoneClient instance. This jira aims to add functionality to write > multiple keys with multiple ozone client instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException
[ https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819000#comment-16819000 ] Shashikant Banerjee edited comment on HDDS-1436 at 4/16/19 1:18 PM: This has been addressed with HDDS-1395. The test fails because of a Precondition check in CommitWatcher#watchForCommit which assumes that the commitIndex2FlushDataMap should not be empty once this function is called, but in tests , it executes a watchForCommit on a putBliock logIndex out of 2 putBlocks calls being made and wait for it to complete. It is possible that, while waiting for first putBlock, the 2nd putBlock also completes and ends up cleaning up the commitIndex2FlushedData Map. Hence, while calling watchForCommit on the next index, it might be possible to have commitIndex2FlushDataMap empty. The fix is to remove the precondition check while calling watchForCommit. was (Author: shashikant): This has been addressed with HDDS-1395. > TestCommitWatcher#testReleaseBuffersOnException fails with > IllegalStateException > > > Key: HDDS-1436 > URL: https://issues.apache.org/jira/browse/HDDS-1436 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone-flaky-test > > the test is failing with the following exception > {code} > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191) > at > org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException
[ https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819000#comment-16819000 ] Shashikant Banerjee commented on HDDS-1436: --- This has been addressed with HDDS-1395. > TestCommitWatcher#testReleaseBuffersOnException fails with > IllegalStateException > > > Key: HDDS-1436 > URL: https://issues.apache.org/jira/browse/HDDS-1436 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone-flaky-test > > the test is failing with the following exception > {code} > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191) > at > org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException
[ https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1436: - Assignee: Shashikant Banerjee > TestCommitWatcher#testReleaseBuffersOnException fails with > IllegalStateException > > > Key: HDDS-1436 > URL: https://issues.apache.org/jira/browse/HDDS-1436 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: ozone-flaky-test > > the test is failing with the following exception > {code} > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191) > at > org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit
[ https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816277#comment-16816277 ] Shashikant Banerjee edited comment on HDDS-1282 at 4/12/19 1:48 PM: Thanks [~elek], In the latest code, the test fails because of datanode crash when the miniOzoneCluster startup. {code:java} 2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 3ab53731-d087-494c-9378-ee35abffb271 at port 53578 2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService (HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 ip:192.168.0.64 2019-04-12 19:13:26,600 INFO impl.RaftServerProxy (RaftServerProxy.java:lambda$start$3(299)) - 3ab53731-d087-494c-9378-ee35abffb271: start RPC server 2019-04-12 19:13:26,605 ERROR server.GrpcService (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start Grpc server java.io.IOException: Failed to bind at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) at org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) at org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) at org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code} The issue is happening because, we use random ports for datanodes in MIniOzoneCluster, where we try to find a free port during set up, but the Ratis server starts at a a later time . In the meantime, if some other datanode picks up the same port, the datanode crash. The patch does not address this issue and is outdated. was (Author: shashikant): Thanks[~elek], In the latest code, the test fails because of datanode crash when the miniOzoneCluster startup. {code:java} 2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 3ab53731-d087-494c-9378-ee35abffb271 at port 53578 2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService (HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 ip:192.168.0.64 2019-04-12 19:13:26,600 INFO impl.RaftServerProxy (RaftServerProxy.java:lambda$start$3(299)) -
[jira] [Commented] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit
[ https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816277#comment-16816277 ] Shashikant Banerjee commented on HDDS-1282: --- Thanks[~elek], In the latest code, the test fails because of datanode crash when the miniOzoneCluster startup. {code:java} 2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis (XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 3ab53731-d087-494c-9378-ee35abffb271 at port 53578 2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService (HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 ip:192.168.0.64 2019-04-12 19:13:26,600 INFO impl.RaftServerProxy (RaftServerProxy.java:lambda$start$3(299)) - 3ab53731-d087-494c-9378-ee35abffb271: start RPC server 2019-04-12 19:13:26,605 ERROR server.GrpcService (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start Grpc server java.io.IOException: Failed to bind at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) at org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) at org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) at org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code} The issue is happening because, we use random ports for datanodes in MIniOzoneCluster, where we try to find a free port during set up, but the Ratis server starts at a a later time . In the meantime, if some other datanode picks up the same port, the datanode crash. > TestFailureHandlingByClient causes a jvm exit > - > > Key: HDDS-1282 > URL: https://issues.apache.org/jira/browse/HDDS-1282 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-1282.001.patch, > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient-output.txt > > > The test causes jvm exit because the test exits prematurely. > {code} > [ERROR]