[jira] [Commented] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline

2019-07-16 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886071#comment-16886071
 ] 

Shashikant Banerjee commented on HDDS-1809:
---

The  issue is happening bcoz, while doing the read, with rackAwareness enabled, 
pipeline.getNodesInOrder call returns the same datanode added thrice in the 
datanodeList as shown below and hence, if a failure is encountered read is 
retried on the same dn. 
{code:java}
if ((request.getCmdType() == ContainerProtos.Type.ReadChunk ||
request.getCmdType() == ContainerProtos.Type.GetSmallFile) &&
topologyAwareRead) {
  datanodeList = pipeline.getNodesInOrder();
} else {
  datanodeList = pipeline.getNodes();
}


datanodeList [f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: 
hw15685, networkLocation: /default-rack, certSerialId: null}, 
f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, 
networkLocation: /default-rack, certSerialId: null}, 
f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, 
networkLocation: /default-rack, certSerialId: null}]

Pipeline[ Id: 865a2079-de8e-472c-baaa-5aa345ed5e57, Nodes: 
f4b0bdf3-66d4-452c-82af-8a570ac0aeb7{ip: 192.168.43.156, host: hw15685, 
networkLocation: /default-rack, certSerialId: 
null}14975e64-2564-433d-9b89-c295083a1161{ip: 192.168.43.156, host: hw15685, 
networkLocation: /default-rack, certSerialId: 
null}efc0749c-c7eb-4b73-a4b2-0abe553ca5e9{ip: 192.168.43.156, host: hw15685, 
networkLocation: /default-rack, certSerialId: null}, Type:STAND_ALONE, 
Factor:THREE, State:OPEN]
{code}
The read path works well with the Neworktopology feature turned off.

> Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis 
> pipeline
> -
>
> Key: HDDS-1809
> URL: https://issues.apache.org/jira/browse/HDDS-1809
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> {code:java}
> java.io.IOException: Unexpected OzoneException: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
> at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> 

[jira] [Assigned] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline

2019-07-16 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1809:
-

Assignee: (was: Shashikant Banerjee)

> Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis 
> pipeline
> -
>
> Key: HDDS-1809
> URL: https://issues.apache.org/jira/browse/HDDS-1809
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> {code:java}
> java.io.IOException: Unexpected OzoneException: java.io.IOException: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
> at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
> at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458)
> at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1809) Ozone Read fails with StatusRunTimeExceptions after 2 datanode fail in Ratis pipeline

2019-07-16 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1809:
-

 Summary: Ozone Read fails with StatusRunTimeExceptions after 2 
datanode fail in Ratis pipeline
 Key: HDDS-1809
 URL: https://issues.apache.org/jira/browse/HDDS-1809
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


{code:java}
java.io.IOException: Unexpected OzoneException: java.io.IOException: 
java.util.concurrent.ExecutionException: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception

at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
at 
org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
at java.io.InputStream.read(InputStream.java:101)
at 
org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
at 
org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.validateData(TestFailureHandlingByClient.java:458)
at 
org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testBlockWritesWithDnFailures(TestFailureHandlingByClient.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1808) TestRatisPipelineCreateAndDestory#testPipelineCreationOnNodeRestart times out

2019-07-15 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1808:
-

 Summary: 
TestRatisPipelineCreateAndDestory#testPipelineCreationOnNodeRestart times out
 Key: HDDS-1808
 URL: https://issues.apache.org/jira/browse/HDDS-1808
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


{code:java}
Error Message
test timed out after 3 milliseconds
Stacktrace
java.lang.Exception: test timed out after 3 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:382)
at 
org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory.waitForPipelines(TestRatisPipelineCreateAndDestory.java:126)
at 
org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory.testPipelineCreationOnNodeRestart(TestRatisPipelineCreateAndDestory.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1807) TestWatchForCommit#testWatchForCommitForRetryfailure fails as a result of no leader election for extended period of time

2019-07-15 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1807:
-

 Summary: TestWatchForCommit#testWatchForCommitForRetryfailure 
fails as a result of no leader election for extended period of time 
 Key: HDDS-1807
 URL: https://issues.apache.org/jira/browse/HDDS-1807
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


{code:java}
org.apache.ratis.protocol.RaftRetryFailureException: Failed 
RaftClientRequest:client-6C83DC527A4C->73bdd98d-b003-44ff-a45b-bd12dfd50509@group-75C642DF7AE9,
 cid=55, seq=1*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$407/213850519@1a8843a2 
for 10 attempts with RetryLimited(maxAttempts=10, sleepTime=1000ms)
Stacktrace
java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.RaftRetryFailureException: Failed 
RaftClientRequest:client-6C83DC527A4C->73bdd98d-b003-44ff-a45b-bd12dfd50509@group-75C642DF7AE9,
 cid=55, seq=1*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$407/213850519@1a8843a2 
for 10 attempts with RetryLimited(maxAttempts=10, sleepTime=1000ms)
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.testWatchForCommitForRetryfailure(TestWatchForCommit.java:345)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}
The client here retries times with a delay of 1 sec between each retry but 
leader eleactiocouldnot complete.
{code:java}
2019-07-12 19:30:46,451 INFO  client.GrpcClientProtocolClient 
(GrpcClientProtocolClient.java:onNext(255)) - 
client-6C83DC527A4C->5931fd83-b899-480e-b15a-ecb8e7f7dd46: receive 
RaftClientReply:client-6C83DC527A4C->5931fd83-b899-480e-b15a-ecb8e7f7dd46@group-75C642DF7AE9,
 cid=55, FAILED org.apache.ratis.protocol.NotLeaderException: Server 
5931fd83-b899-480e-b15a-ecb8e7f7dd46 is not the leader (null). Request must be 
sent to leader., logIndex=0, commits[5931fd83-b899-480e-b15a-ecb8e7f7dd46:c-1]
2019-07-12 19:30:47,469 INFO  client.GrpcClientProtocolClient 
(GrpcClientProtocolClient.java:onNext(255)) - 
client-6C83DC527A4C->d83929f1-c4db-499d-b67f-ad7f10dd7dde: receive 
RaftClientReply:client-6C83DC527A4C->d83929f1-c4db-499d-b67f-ad7f10dd7dde@group-75C642DF7AE9,
 cid=55, FAILED org.apache.ratis.protocol.NotLeaderException: Server 
d83929f1-c4db-499d-b67f-ad7f10dd7dde is not the leader (null). Request must be 
sent to leader., logIndex=0, commits[d83929f1-c4db-499d-b67f-ad7f10dd7dde:c-1]
2019-07-12 19:30:48,504 INFO  client.GrpcClientProtocolClient 
(GrpcClientProtocolClient.java:onNext(255)) - 

[jira] [Created] (HDDS-1806) TestDataValidateWithSafeByteOperations tests are failing

2019-07-15 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1806:
-

 Summary: TestDataValidateWithSafeByteOperations tests are failing
 Key: HDDS-1806
 URL: https://issues.apache.org/jira/browse/HDDS-1806
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


 
{code:java}
Unexpected Storage Container Exception: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 3 does not exist

Stacktrace
java.io.IOException: Unexpected Storage Container Exception: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 3 does not exist at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:549)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:540)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$2(BlockOutputStream.java:615)
 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) 
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
ContainerID 3 does not exist at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:536)
 at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:537)
 ... 7 more
{code}
The error propagated to client is erroneous. The container creation failed as a 
result disk full   condition but never propagated to client.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1804) TestCloseContainerHandlingByClient#estBlockWrites fails intermittently

2019-07-15 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1804:
-

 Summary: TestCloseContainerHandlingByClient#estBlockWrites fails 
intermittently
 Key: HDDS-1804
 URL: https://issues.apache.org/jira/browse/HDDS-1804
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


The test fails intermittently as reported here:

[https://builds.apache.org/job/hadoop-multibranch/job/PR-1082/1/testReport/org.apache.hadoop.ozone.client.rpc/TestCloseContainerHandlingByClient/testBlockWrites/]
{code:java}
java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
at java.io.InputStream.read(InputStream.java:101)
at 
org.apache.hadoop.ozone.container.ContainerTestHelper.validateData(ContainerTestHelper.java:709)
at 
org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.validateData(TestCloseContainerHandlingByClient.java:401)
at 
org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testBlockWrites(TestCloseContainerHandlingByClient.java:471)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.

2019-07-12 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883727#comment-16883727
 ] 

Shashikant Banerjee commented on HDDS-1753:
---

The issue being caused here is as data is still to be replicated to the 
followers via leader, as a result of key delete , a block in a closed container 
can get deleted on the leader. When the follower asks for the chunk data from 
the leader, it fails as the chunk file does not exist in the leader.

The solution being proposed here is as follows:

Whenever a delete command gets received on a datanode from SCM, it should first 
check the min replicated index across all the servers in the pipeline. 
ContainerStateMachine will also track, the close container log index for each 
cotainer. Now, if the min replicated index >= close container index in the 
leader, a delete operation will be queued over Ratis in the leader and same 
will be ignored in the follower and now delete will happen over Ratis. In case, 
close container index is not replicated, delete transaction will never be 
enqueued over Ratis and ignored. SCM already has a retry policy in place to 
retry the same delete.

In case, the Ratis pipeline does not exist, delete will work as is.

> Datanode unable to find chunk while replication data using ratis.
> -
>
> Key: HDDS-1753
> URL: https://issues.apache.org/jira/browse/HDDS-1753
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> Leader datanode is unable to read chunk from the datanode while replicating 
> data from leader to follower.
> Please note that deletion of keys is also happening while the data is being 
> replicated.
> {code}
> 2019-07-02 19:39:22,604 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl 
> (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3
> -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
> (9770) already h
> as the append entries (first index: 1)
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 INFO  keyvalue.KeyValueHandler 
> (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace 
> ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c
> hunk file. chunk info 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
>  offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
> (9770) already h
> as the append entries (first index: 2)
> 2019-07-02 19:39:22,606 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | 
> op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | 
> ret=FAILURE
> java.lang.Exception: Unable to find the chunk file. chunk info 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
>  offset=0, len=2048}
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> 

[jira] [Updated] (HDDS-1492) Generated chunk size name too long.

2019-07-12 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1492:
--
Status: Patch Available  (was: Open)

> Generated chunk size name too long.
> ---
>
> Key: HDDS-1492
> URL: https://issues.apache.org/jira/browse/HDDS-1492
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following exception is seen in SCM logs intermittently. 
> {code}
> java.lang.RuntimeException: file name 
> 'chunks/2a54b2a153f4a9c5da5f44e2c6f97c60_stream_9c6ac565-e2d4-469c-bd5c-47922a35e798_chunk_10.tmp.2.23115'
>  is too long ( > 100 bytes)
> {code}
> We may have to limit the name of the chunk to 100 bytes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory

2019-07-12 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14499:
---
  Resolution: Fixed
   Fix Version/s: 3.3.0
Target Version/s: 3.3.0
  Status: Resolved  (was: Patch Available)

Thanks [~szetszwo] for the review. I have committed this change to trunk.

> Misleading REM_QUOTA value with snapshot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, 
> HDFS-14499.002.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1789) BlockOutputStream#watchForCommit fails with UnsupportedOperationException

2019-07-11 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1789:
-

 Summary: BlockOutputStream#watchForCommit fails with 
UnsupportedOperationException 
 Key: HDDS-1789
 URL: https://issues.apache.org/jira/browse/HDDS-1789
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


{code:java}
```2019-07-12 08:45:17,981 ERROR ozone.MiniOzoneLoadGenerator 
(MiniOzoneLoadGenerator.java:load(105)) - LOADGEN: Create 
key:pool-444-thread-5-1328179725 failed with exception, skipping
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at java.util.AbstractCollection.addAll(AbstractCollection.java:344)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:363)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:332)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:259)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
at java.io.OutputStream.write(OutputStream.java:75)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:103)
at 
org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:152)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```
ye ek aur issue hai from Chaos
please raise a bug

Shashikant Banerjee [9:56 AM]
okk
actually jstacks are taken at 15 min interval
i am yet to find any common hanging thread among all the 3 jstacks

Mukul Kumar Singh [10:00 AM]
2nd file mein
```java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x7fb5b29ed228> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1595)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:4894)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1438)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:118)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)```
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-07-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14499:
---
Attachment: HDFS-14499.002.patch

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, 
> HDFS-14499.002.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-07-11 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882717#comment-16882717
 ] 

Shashikant Banerjee commented on HDFS-14499:


Thanks [~szetszwo]. Patch v2 address the review comments.

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, 
> HDFS-14499.002.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1780) TestFailureHandlingByClient tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1780:
--
Status: Patch Available  (was: Open)

> TestFailureHandlingByClient tests are flaky
> ---
>
> Key: HDDS-1780
> URL: https://issues.apache.org/jira/browse/HDDS-1780
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The tests seem to fail bcoz , when the datanode goes down with stale node 
> interval being set to a low value, containers may get closed early and client 
> writes might fail with closed container exception rather than pipeline 
> failure/Timeout exceptions as excepted in the tests. The fix made here is to 
> tune the stale node interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-07-10 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882210#comment-16882210
 ] 

Shashikant Banerjee commented on HDFS-14499:


[~szetszwo], can you please have a look?

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1780) TestFailureHandlingByClient tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1780:
-

 Summary: TestFailureHandlingByClient tests are flaky
 Key: HDDS-1780
 URL: https://issues.apache.org/jira/browse/HDDS-1780
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


The tests seem to fail bcoz , when the datanode goes down with stale node 
interval being set to a low value, containers may get closed early and client 
writes might fail with closed container exception rather than pipeline 
failure/Timeout exceptions as excepted in the tests. The fix made here is to 
tune the stale node interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1779:
--
Description: The tests have become flaky bcoz once  nodes are shutdown inn 
Ratis pipeline, a watch request can either be received at server at a server 
and fail with NotReplicatedException or sometimes it fails with 
StatusRuntimeExceptions from grpc which both need to be accounted for in the 
tests. Other than that, HDDS-1384 also causes bind exception to e thrown 
intermittently which in turn shuts down the miniOzoneCluster. To overcome this, 
the test class has been refactored as well.  (was: The tests have become flaky 
bcoz once  nodes are shutdown inn Ratis pipeline, a watch request can either be 
received at server at a server and fail with NotReplicatedException or 
soemtimes it fails with StatusRuntimeExceptions from grpc which both need to be 
accounted for in the tests. Other than that, HDDS-1384 also causes bind 
exception to e thrown intermittently which in turn shuts down the 
miniOzoneCluster. To overcome this, the test class has been refactored as well.)

> TestWatchForCommit tests are flaky
> --
>
> Key: HDDS-1779
> URL: https://issues.apache.org/jira/browse/HDDS-1779
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The tests have become flaky bcoz once  nodes are shutdown inn Ratis pipeline, 
> a watch request can either be received at server at a server and fail with 
> NotReplicatedException or sometimes it fails with StatusRuntimeExceptions 
> from grpc which both need to be accounted for in the tests. Other than that, 
> HDDS-1384 also causes bind exception to e thrown intermittently which in turn 
> shuts down the miniOzoneCluster. To overcome this, the test class has been 
> refactored as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1779:
--
Description: The tests have become flaky bcoz once  nodes are shutdown inn 
Ratis pipeline, a watch request can either be received at server at a server 
and fail with NotReplicatedException or soemtimes it fails with 
StatusRuntimeExceptions from grpc which both need to be accounted for in the 
tests. Other than that, HDDS-1384 also causes bind exception to e thrown 
intermittently which in turn shuts down the miniOzoneCluster. To overcome this, 
the test class has been refactored as well.

> TestWatchForCommit tests are flaky
> --
>
> Key: HDDS-1779
> URL: https://issues.apache.org/jira/browse/HDDS-1779
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The tests have become flaky bcoz once  nodes are shutdown inn Ratis pipeline, 
> a watch request can either be received at server at a server and fail with 
> NotReplicatedException or soemtimes it fails with StatusRuntimeExceptions 
> from grpc which both need to be accounted for in the tests. Other than that, 
> HDDS-1384 also causes bind exception to e thrown intermittently which in turn 
> shuts down the miniOzoneCluster. To overcome this, the test class has been 
> refactored as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1779:
--
Status: Patch Available  (was: Open)

> TestWatchForCommit tests are flaky
> --
>
> Key: HDDS-1779
> URL: https://issues.apache.org/jira/browse/HDDS-1779
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1779) TestWatchForCommit tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1779:
--
Target Version/s: 0.5.0  (was: 0.4.1)

> TestWatchForCommit tests are flaky
> --
>
> Key: HDDS-1779
> URL: https://issues.apache.org/jira/browse/HDDS-1779
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1753) Datanode unable to find chunk while replication data using ratis.

2019-07-03 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1753:
-

Assignee: Shashikant Banerjee

> Datanode unable to find chunk while replication data using ratis.
> -
>
> Key: HDDS-1753
> URL: https://issues.apache.org/jira/browse/HDDS-1753
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> Leader datanode is unable to read chunk from the datanode while replicating 
> data from leader to follower.
> Please note that deletion of keys is also happening while the data is being 
> replicated.
> {code}
> 2019-07-02 19:39:22,604 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#70:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 ERROR impl.ChunkManagerImpl 
> (ChunkUtils.java:readData(161)) - Unable to find the chunk file. chunk info : 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3
> -4d64-93d8-fa2ebafee933_chunk_1, offset=0, len=2048}
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
> (9770) already h
> as the append entries (first index: 1)
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#71:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 2019-07-02 19:39:22,605 INFO  keyvalue.KeyValueHandler 
> (ContainerUtils.java:logAndReturnError(146)) - Operation: ReadChunk : Trace 
> ID: 4216d461a4679e17:4216d461a4679e17:0:0 : Message: Unable to find the c
> hunk file. chunk info 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
>  offset=0, len=2048} : Result: UNABLE_TO_FIND_CHUNK
> 2019-07-02 19:39:22,605 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(990)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: Failed appendEntries as latest snapshot 
> (9770) already h
> as the append entries (first index: 2)
> 2019-07-02 19:39:22,606 INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:checkInconsistentAppendEntries(972)) - 
> 5ac88709-a3a2-4c8f-91de-5e54b617f05e: inconsistency entries. 
> Reply:76a3eb0f-d7cd-477b-8973-db1
> 014feb398<-5ac88709-a3a2-4c8f-91de-5e54b617f05e#72:FAIL,INCONSISTENCY,nextIndex:9771,term:2,followerCommit:9782
> 19:39:22.606 [pool-195-thread-19] ERROR DNAudit - user=null | ip=null | 
> op=READ_CHUNK {blockData=conID: 3 locID: 102372189549953034 bcsId: 0} | 
> ret=FAILURE
> java.lang.Exception: Unable to find the chunk file. chunk info 
> ChunkInfo{chunkName='76ec669ae2cb6e10dd9f08c0789c5fdf_stream_a2850dce-def3-4d64-93d8-fa2ebafee933_chunk_1,
>  offset=0, len=2048}
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:320)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:346)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:476)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$getCachedStateMachineData$2(ContainerStateMachine.java:495)
>  ~[hadoop-hdds-container-service-0.5.0-SN
> APSHOT.jar:?]
> at 
> com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
>  ~[guava-11.0.2.jar:?]
> at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
>  ~[guava-11.0.2.jar:?]
> at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) 
> ~[guava-11.0.2.jar:?]
> at 
> 

[jira] [Updated] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens

2019-06-07 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1654:
--
Status: Patch Available  (was: Open)

> Ensure container state on datanode gets synced to disk whenever state change 
> happens
> 
>
> Key: HDDS-1654
> URL: https://issues.apache.org/jira/browse/HDDS-1654
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, whenever there is a container state change, it updates the 
> container but doesn't sync.
> The idea is here to is to force sync the state to disk everytime there is a 
> state change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1621) writeData in ChunkUtils should not use AsynchronousFileChannel

2019-06-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1621.
---
   Resolution: Fixed
Fix Version/s: 0.4.1

Thanks [~sdeka] for working on this. I have committed this change to trunk.

> writeData in ChunkUtils should not use AsynchronousFileChannel
> --
>
> Key: HDDS-1621
> URL: https://issues.apache.org/jira/browse/HDDS-1621
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Shashikant Banerjee
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, chunks writes are not synced to disk by default. When 
> flushStateMachineData gests invoked from Ratis, it should also ensure all the 
> pending chunk writes should be flushed to disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens

2019-06-05 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1654:
--
Priority: Blocker  (was: Major)

> Ensure container state on datanode gets synced to disk whenever state change 
> happens
> 
>
> Key: HDDS-1654
> URL: https://issues.apache.org/jira/browse/HDDS-1654
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Currently, whenever there is a container state change, it updates the 
> container but doesn't sync.
> The idea is here to is to force sync the state to disk everytime there is a 
> state change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens

2019-06-05 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1654:
--
Affects Version/s: 0.5.0

> Ensure container state on datanode gets synced to disk whenever state change 
> happens
> 
>
> Key: HDDS-1654
> URL: https://issues.apache.org/jira/browse/HDDS-1654
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Currently, whenever there is a container state change, it updates the 
> container but doesn't sync.
> The idea is here to is to force sync the state to disk everytime there is a 
> state change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1654) Ensure container state on datanode gets synced to disk whenever state change happens

2019-06-05 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1654:
-

 Summary: Ensure container state on datanode gets synced to disk 
whenever state change happens
 Key: HDDS-1654
 URL: https://issues.apache.org/jira/browse/HDDS-1654
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, whenever there is a container state change, it updates the container 
but doesn't sync.

The idea is here to is to force sync the state to disk everytime there is a 
state change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1621) flushStateMachineData should ensure the write chunks are flushed to disk

2019-05-31 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1621:
-

 Summary: flushStateMachineData should ensure the write chunks are 
flushed to disk
 Key: HDDS-1621
 URL: https://issues.apache.org/jira/browse/HDDS-1621
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Supratim Deka


Currently, chunks writes are not synced to disk by default. When 
flushStateMachineData gests invoked from Ratis, it should also ensure all the 
pending chunk writes should be flushed to disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852818#comment-16852818
 ] 

Shashikant Banerjee commented on HDFS-14499:


Thanks [~szetszwo]. Patch v1 addresses your review comments.

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-31 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14499:
---
Attachment: HDFS-14499.001.patch

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1502) Add metrics for Ozone Ratis performance

2019-05-30 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1502:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add metrics for Ozone Ratis performance
> ---
>
> Key: HDDS-1502
> URL: https://issues.apache.org/jira/browse/HDDS-1502
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This jira will add some metrics for Ratis pipeline performance
> 1) number of bytes written
> 2) number Read state Machine calls
> 3) no of Read StateMachine Fails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1614) Container Missing in the datanode after restart

2019-05-30 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1614:
-

Assignee: Shashikant Banerjee

> Container Missing in the datanode after restart
> ---
>
> Key: HDDS-1614
> URL: https://issues.apache.org/jira/browse/HDDS-1614
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> Container missing on the datanode after a restart.
> {code}
> 08:10:44.308 [pool-2131-thread-1] ERROR DNAudit - user=null | ip=null | 
> op=WRITE_CHUNK {blockData=conID: 34 locID: 102182684750055212 bcsId: 6198} | 
> ret=FAILURE
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 34 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:207)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$0(ContainerStateMachine.java:385)
>  ~[hadoop-hdds-container-service-0.5.0-SNAPSHOT.jar:?]
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  [?:1.8.0_171]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_171]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_171]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1613) Read key fails with "Unable to find the block"

2019-05-30 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1613:
-

Assignee: Shashikant Banerjee

> Read key fails with "Unable to find the block"
> --
>
> Key: HDDS-1613
> URL: https://issues.apache.org/jira/browse/HDDS-1613
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> Block read fails with 
> {code}
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the block with bcsID 11777 .Container 68 bcsId is 0.
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:573)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:120)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.initializeBlockInputStream(KeyInputStream.java:295)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.getStream(KeyInputStream.java:265)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream$ChunkInputStreamEntry.access$000(KeyInputStream.java:229)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.getStreamEntry(KeyInputStream.java:107)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:140)
> at 
> org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:114)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:147)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Looking at the 3 datanodes, the containers are in bcs id of 11748, 11748 and 
> 0.
> {code}
> 2019-05-30 08:28:05,348 INFO  keyvalue.KeyValueHandler 
> (ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace 
> ID: 93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block 
> with bcsID 11777 .Container 68 bcsId is 11748. : Result: UNKNOWN_BCSID
> 2019-05-30 08:28:05,363 INFO  keyvalue.KeyValueHandler 
> (ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace 
> ID: 93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block 
> with bcsID 11777 .Container 68 bcsId is 11748. : Result: UNKNOWN_BCSID
> 2019-05-30 08:28:05,377 INFO  keyvalue.KeyValueHandler 
> (ContainerUtils.java:logAndReturnError(146)) - Operation: GetBlock : Trace 
> ID: 93a2a596076d2ee4:93a2a596076d2ee4:0:0 : Message: Unable to find the block 
> with bcsID 11777 .Container 68 bcsId is 0. : Result: UNKNOWN_BCSID
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1610) ContainerStateMachine should not take snapshot if any of the applyTransactions fail

2019-05-29 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1610:
-

 Summary: ContainerStateMachine should not take snapshot if any of 
the applyTransactions fail
 Key: HDDS-1610
 URL: https://issues.apache.org/jira/browse/HDDS-1610
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


If the applyTransaction fails in the containerStateMachine, all the subsequent 
snapshots should be disallowed. As in case. it restarts, it should always 
reapply from the last successful transactio committed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14499:
---
Attachment: HDFS-14499.000.patch

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14499:
---
Status: Patch Available  (was: Open)

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDFS-14499:
--

Assignee: Shashikant Banerjee

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1509) TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1509:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently
> 
>
> Key: HDDS-1509
> URL: https://issues.apache.org/jira/browse/HDDS-1509
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The test fails because, the test expects a exception after 2 datanodes 
> failures to be of type RaftRetryFailureException. But it might happen that, 
> the pipeline gets destroyed quickly then actual write executes over Ratis, 
> hence it will fail with GroupMismatchhException in such case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1584) Fix TestFailureHandlingByClient tests

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1584:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix TestFailureHandlingByClient tests
> -
>
> Key: HDDS-1584
> URL: https://issues.apache.org/jira/browse/HDDS-1584
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.1
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The test failures are caused bcoz the test relies on 
> KeyoutputStream#getLocationList() to validate the no of preallocated blocks, 
> but it has been changed recently to exclude the empty blocks. The fix is 
> mostly to use KeyOutputStream#getStreamEntries() to get the no of 
> preallocated blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1558) IllegalArgumentException while processing container Reports

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1558:
--
Status: Patch Available  (was: Open)

> IllegalArgumentException while processing container Reports
> ---
>
> Key: HDDS-1558
> URL: https://issues.apache.org/jira/browse/HDDS-1558
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IllegalArgumentException while processing container Reports
> {code}
> 2019-05-19 23:15:04,137 ERROR events.SingleThreadExecutor 
> (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution 
> message 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@1a117ebc
> java.lang.IllegalArgumentException
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:178)
> at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:124)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
> at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1558) IllegalArgumentException while processing container Reports

2019-05-27 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848793#comment-16848793
 ] 

Shashikant Banerjee commented on HDDS-1558:
---

The issue seems to be happening because of the following sequence:
 # 2 out of 3 container replica marked unhealthy, but 1 replica keeps on 
getting and applying transaction successfully updating its BCSID.
 # SCM gets updated of the latest BCSID by the healthy replica.
 # one unhealty and one healthy node gets restarted and join the ring , close 
container command issued from SCM gets executred via Ratis on unhealthy replica 
as the after restart , the unnhealthy state is not persisted.
 # When the BCSID reported by replica report of this replica, it would hit the 
exception as it would have a lower BCSID than what SCM already has.

> IllegalArgumentException while processing container Reports
> ---
>
> Key: HDDS-1558
> URL: https://issues.apache.org/jira/browse/HDDS-1558
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IllegalArgumentException while processing container Reports
> {code}
> 2019-05-19 23:15:04,137 ERROR events.SingleThreadExecutor 
> (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution 
> message 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@1a117ebc
> java.lang.IllegalArgumentException
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:178)
> at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:124)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
> at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1558) IllegalArgumentException while processing container Reports

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1558:
-

Assignee: Shashikant Banerjee

> IllegalArgumentException while processing container Reports
> ---
>
> Key: HDDS-1558
> URL: https://issues.apache.org/jira/browse/HDDS-1558
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> IllegalArgumentException while processing container Reports
> {code}
> 2019-05-19 23:15:04,137 ERROR events.SingleThreadExecutor 
> (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution 
> message 
> org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@1a117ebc
> java.lang.IllegalArgumentException
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerState(AbstractContainerReportHandler.java:178)
> at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:124)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
> at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
> at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1584) Fix TestFailureHandlingByClient tests

2019-05-27 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1584:
--
Description: The test failures are caused bcoz the test relies on 
KeyoutputStream#getLocationList() to validate the no of preallocated blocks, 
but it has been changed recently to exclude the empty blocks. The fix is mostly 
to use KeyOutputStream#getStreamEntries() to get the no of preallocated blocks.

> Fix TestFailureHandlingByClient tests
> -
>
> Key: HDDS-1584
> URL: https://issues.apache.org/jira/browse/HDDS-1584
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.1
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The test failures are caused bcoz the test relies on 
> KeyoutputStream#getLocationList() to validate the no of preallocated blocks, 
> but it has been changed recently to exclude the empty blocks. The fix is 
> mostly to use KeyOutputStream#getStreamEntries() to get the no of 
> preallocated blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1589) CloseContainer transaction on unhealthy replica should fail with CONTAINER_UNHEALTHY exception

2019-05-24 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1589:
--
Description: 
Currently, while trying to close an unhealthy container over Ratis, it fails 
with INTERNAL_ERROR which leads to exception as follow:
{code:java}
2019-05-19 22:00:48,386 ERROR commandhandler.CloseContainerCommandHandler 
(CloseContainerCommandHandler.java:handle(124)) - Can't close container #125
org.apache.ratis.protocol.StateMachineException: 
java.util.concurrent.CompletionException from Server 
faea26b0-9c60-4b4c-a0df-bf7c67cc5b48: java.lang.IllegalStateException
at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$replyPendingRequest$24(RaftServerImpl.java:1221)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
java.lang.IllegalStateException
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
... 3 more
Caused by: java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:300)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$5(ContainerStateMachine.java:613)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
{code}
This happens when , it tries to mark the container unhealthy as the transaction 
has failed and tries to mark the container unhealthy where it expects the 
container to be in OPEN or CLOSIG state ad hence asserts. It should ideally 
fail with CONTAINER_UNHEALTHY so as to not retry to not change the state to be 
UNHEALTHY.

  was:
Currently, while trying to close an unhealthy container over Ratis, it fails 
with INTERNAL_ERROR which leads to exception as follow:

{code:java}
2019-05-19 22:00:48,386 ERROR commandhandler.CloseContainerCommandHandler 
(CloseContainerCommandHandler.java:handle(124)) - Can't close container #125
org.apache.ratis.protocol.StateMachineException: 
java.util.concurrent.CompletionException from Server 
faea26b0-9c60-4b4c-a0df-bf7c67cc5b48: java.lang.IllegalStateException
at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$replyPendingRequest$24(RaftServerImpl.java:1221)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
java.lang.IllegalStateException
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
... 3 more
Caused by: java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:300)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)

[jira] [Created] (HDDS-1589) CloseContainer transaction on unhealthy replica should fail with CONTAINER_UNHEALTHY exception

2019-05-24 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1589:
-

 Summary: CloseContainer transaction on unhealthy replica should 
fail with CONTAINER_UNHEALTHY exception
 Key: HDDS-1589
 URL: https://issues.apache.org/jira/browse/HDDS-1589
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


Currently, while trying to close an unhealthy container over Ratis, it fails 
with INTERNAL_ERROR which leads to exception as follow:

{code:java}
2019-05-19 22:00:48,386 ERROR commandhandler.CloseContainerCommandHandler 
(CloseContainerCommandHandler.java:handle(124)) - Can't close container #125
org.apache.ratis.protocol.StateMachineException: 
java.util.concurrent.CompletionException from Server 
faea26b0-9c60-4b4c-a0df-bf7c67cc5b48: java.lang.IllegalStateException
at 
org.apache.ratis.server.impl.RaftServerImpl.lambda$replyPendingRequest$24(RaftServerImpl.java:1221)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
java.lang.IllegalStateException
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
... 3 more
Caused by: java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:300)
at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$5(ContainerStateMachine.java:613)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
{code}

This happens when , it tries to mark the container unhealthy as the transaction 
has failed and tries to mark the container unhealthy where it expects the 
container to be in OPE or CLOSIG state ad hence asserts. It should ideally fail 
with CONTAINER_UNHEATHY so as to not retry to not change the state to be 
UNNHEATHY.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1584) Fix TestFailureHandlingByClient tests

2019-05-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1584:
--
Status: Patch Available  (was: Open)

> Fix TestFailureHandlingByClient tests
> -
>
> Key: HDDS-1584
> URL: https://issues.apache.org/jira/browse/HDDS-1584
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.1
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1584) Fix TestFailureHandlingByClient tests

2019-05-22 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1584:
-

 Summary: Fix TestFailureHandlingByClient tests
 Key: HDDS-1584
 URL: https://issues.apache.org/jira/browse/HDDS-1584
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.1
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.1






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key

2019-05-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1449:
--
Fix Version/s: (was: 0.5.0)
   0.4.1

> JVM Exit in datanode while committing a key
> ---
>
> Key: HDDS-1449
> URL: https://issues.apache.org/jira/browse/HDDS-1449
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.4.1
>
> Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, 
> hs_err_pid67466.log
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Saw the following trace in MiniOzoneChaosCluster run.
> {code}
> C  [librocksdbjni17271331491728127.jnilib+0x9755c]  
> Java_org_rocksdb_RocksDB_write0+0x1c
> J 13917  org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e 
> [0x0001102ff580+0xae]
> J 17167 C2 
> org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V
>  (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c]
> J 20434 C1 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J
>  (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c]
> J 19262 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540]
> J 15095 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880]
> J 19301 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4]
> J 15997 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object;
>  (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4]
> J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 
> bytes) @ 0x00010fc80094 [0x00010fc8+0x94]
> J 17368 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200]
> J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x00011012a004 [0x000110129f00+0x104]
> J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 
> [0x00011002b000+0x144]
> v  ~StubRoutines::call_stub
> V  [libjvm.dylib+0x2ef1f6]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x6ae
> V  [libjvm.dylib+0x2ef99a]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V  [libjvm.dylib+0x2efb46]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V  [libjvm.dylib+0x34a46d]  thread_entry(JavaThread*, Thread*)+0x7c
> V  [libjvm.dylib+0x56eb0f]  JavaThread::thread_main_inner()+0x9b
> V  [libjvm.dylib+0x57020a]  JavaThread::run()+0x1c2
> V  [libjvm.dylib+0x48d4a6]  java_start(Thread*)+0xf6
> C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
> C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
> C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd
> C  0x
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-22 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845808#comment-16845808
 ] 

Shashikant Banerjee commented on HDDS-1517:
---

Thanks [~jnp] for the review. I have committed this change to trunk.

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
> Attachments: HDDS-1517.000.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1517:
--
  Resolution: Fixed
Target Version/s: 0.4.1  (was: 0.5.0)
  Status: Resolved  (was: Patch Available)

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
> Attachments: HDDS-1517.000.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key

2019-05-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1449:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

Thanks [~msingh] for working in this. I have committed this change to trunk.

> JVM Exit in datanode while committing a key
> ---
>
> Key: HDDS-1449
> URL: https://issues.apache.org/jira/browse/HDDS-1449
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.5.0
>
> Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, 
> hs_err_pid67466.log
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Saw the following trace in MiniOzoneChaosCluster run.
> {code}
> C  [librocksdbjni17271331491728127.jnilib+0x9755c]  
> Java_org_rocksdb_RocksDB_write0+0x1c
> J 13917  org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e 
> [0x0001102ff580+0xae]
> J 17167 C2 
> org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V
>  (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c]
> J 20434 C1 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J
>  (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c]
> J 19262 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540]
> J 15095 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880]
> J 19301 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4]
> J 15997 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object;
>  (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4]
> J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 
> bytes) @ 0x00010fc80094 [0x00010fc8+0x94]
> J 17368 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200]
> J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x00011012a004 [0x000110129f00+0x104]
> J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 
> [0x00011002b000+0x144]
> v  ~StubRoutines::call_stub
> V  [libjvm.dylib+0x2ef1f6]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x6ae
> V  [libjvm.dylib+0x2ef99a]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V  [libjvm.dylib+0x2efb46]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V  [libjvm.dylib+0x34a46d]  thread_entry(JavaThread*, Thread*)+0x7c
> V  [libjvm.dylib+0x56eb0f]  JavaThread::thread_main_inner()+0x9b
> V  [libjvm.dylib+0x57020a]  JavaThread::run()+0x1c2
> V  [libjvm.dylib+0x48d4a6]  java_start(Thread*)+0xf6
> C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
> C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
> C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd
> C  0x
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1497) Refactor blockade Tests

2019-05-21 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844870#comment-16844870
 ] 

Shashikant Banerjee commented on HDDS-1497:
---

Thanks [~nilotpalnandi] for working on this. Some comments inline:
1. Please update comments for property, getter and setter functions.
2.cluster.py:223-224 : > incorrect comments.
3. clusterUtils.py:324 -> "om_1" should be "om"?
4.cluster_utils.py:296 -> which file checksum is it supposed to compute ? can 
you please update the comments?


> Refactor blockade Tests
> ---
>
> Key: HDDS-1497
> URL: https://issues.apache.org/jira/browse/HDDS-1497
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Nilotpal Nandi
>Assignee: Nilotpal Nandi
>Priority: Major
> Attachments: HDDS-1497.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14505) "touchz" command should check quota limit before deleting an already existing file

2019-05-21 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDFS-14505:
--

 Summary: "touchz" command should check quota limit before deleting 
an already existing file
 Key: HDFS-14505
 URL: https://issues.apache.org/jira/browse/HDFS-14505
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Shashikant Banerjee


{code:java}
HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:14:01,080 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file4


HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file4

2019-05-21 15:14:12,247 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=5

HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:14:20,607 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
{code}
Here, the "touchz" command failed to create the file as the quota limit was 
hit, but ended up deleting the original file which existed. It should do the 
quota check before deleting the file so that after successful deletion, 
creation should succeed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14504) Rename with Snapshots does not honor quota limit

2019-05-21 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14504:
---
Description: 
Steps to Reproduce:


{code:java}
HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2

2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Allowing snapshot on /dir2 succeeded

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1

2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap1
HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2

2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex

2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=4

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2

2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap2


HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2

HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3

2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey

2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=5
{code}

// create operation fails here as it has already exceeded the quota limit

{code}
HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3

2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap3

HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4

2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
{code}

// Rename operation succeeds here adding on to the namespace quota

{code}
HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez

2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=6
{code}
// File creation fails here but file count has been increased to 6, bcoz of the 
previous rename operation{code}
The quota being set here is 3. Each successive rename adds an entry to the 
deleted list of the snapshot diff which gets accounted in the namespace quota, 
but the rename operation is allowed even when it exceeds the quota limit with 
snapshots. Once, an attempt is made to create a file, it fails.

  was:
Steps to Reproduce:


{code:java}
HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2

2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes 

[jira] [Updated] (HDFS-14504) Rename with Snapshots does not honor quota limit

2019-05-21 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-14504:
---
Description: 
Steps to Reproduce:


{code:java}
HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2

2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Allowing snapshot on /dir2 succeeded

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1

2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap1
HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2

2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex

2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=4

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2

2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap2


HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2

HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3

2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey

2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=5

// create operation fails here as it has already exceeded the quota limit

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3

2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap3

HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4

2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
// Rename operation succeeds here adding on to the namespace quota

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez

2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=6

// File creation fails here but file count has been increased to 6, bcoz of the 
previous rename operation{code}
The quota being set here is 3. Each successive rename adds an entry to the 
deleted list of the snapshot diff which gets accounted in the namespace quota, 
but the rename operation is allowed even when it exceeds the quota limit with 
snapshots. Once, an attempt is made to create a file, it fails.

  was:
Steps to Reproduce:


{code:java}
HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2

2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin 

[jira] [Created] (HDFS-14504) Rename with Snapshots does not honor quota limit

2019-05-21 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDFS-14504:
--

 Summary: Rename with Snapshots does not honor quota limit
 Key: HDFS-14504
 URL: https://issues.apache.org/jira/browse/HDFS-14504
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Shashikant Banerjee


Steps to Reproduce:


{code:java}
HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2

2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Allowing snapshot on /dir2 succeeded

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1

2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap1
HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2

2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex

2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=4

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2

2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap2


HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2

2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Found 1 items

-rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2

HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3

2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey

2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=5

// create operation fails here as it has already exceeded the quota limit

HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3

2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

Created snapshot /dir2/.snapshot/snap3

HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4

2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
// Rename operation succeeds here adding on to the namespace quota

HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez

2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable

touchz: The NameSpace quota (directories and files) of directory /dir2 is 
exceeded: quota=3 file count=6

// Fie creation fails here but file count has been increased to 6, bcoz of the 
previous rename operation{code}
The quota being set here is 3. Each successive rename adds an entry to the 
deleted list of the snapshot diff which gets accounted in the namespace quota, 
but the rename operation is allowed even when it exceeds the quota limit with 
snapshots. Once, an attempt is made to create a file, it fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDDS-1502) Add metrics for Ozone Ratis performance

2019-05-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1502:
--
Description: 
This jira will add some metrics for Ratis pipeline performance
1) number of bytes written
2) number Read state Machine calls

3) no of Read StateMachine Fails

  was:
This jira will add some metrics for Ratis pipeline performance

a) number of chunks written per seconds
b) number of bytes written per second
c) number of chunk/bytes missed during read State Machine data.


> Add metrics for Ozone Ratis performance
> ---
>
> Key: HDDS-1502
> URL: https://issues.apache.org/jira/browse/HDDS-1502
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This jira will add some metrics for Ratis pipeline performance
> 1) number of bytes written
> 2) number Read state Machine calls
> 3) no of Read StateMachine Fails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1502) Add metrics for Ozone Ratis performance

2019-05-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1502:
--
Status: Patch Available  (was: Open)

> Add metrics for Ozone Ratis performance
> ---
>
> Key: HDDS-1502
> URL: https://issues.apache.org/jira/browse/HDDS-1502
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This jira will add some metrics for Ratis pipeline performance
> 1) number of bytes written
> 2) number Read state Machine calls
> 3) no of Read StateMachine Fails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-17 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842325#comment-16842325
 ] 

Shashikant Banerjee commented on HDDS-1517:
---

Thanks [~jnp], as discussed i have updated the patch in the pull request.

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
> Attachments: HDDS-1517.000.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-17 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDFS-14499:
--

 Summary: Misleading REM_QUOTA value with snasphot and trash 
feature enabled for a directory
 Key: HDFS-14499
 URL: https://issues.apache.org/jira/browse/HDFS-14499
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Shashikant Banerjee


This is the flow of steps where we see a discrepancy between REM_QUOTA and new 
file operation failure. REM_QUOTA shows a value of  1 but file creation 
operation does not succeed.
{code:java}
hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
Allowing snaphot on /dir1 succeeded
hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
Created snapshot /dir1/.snapshot/snap1
hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
PATHNAME
2 0 none inf 1 1 0 /dir1
hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
'hdfs://smajetinn/dir1/file1' to trash at: 
hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
PATHNAME
2 1 none inf 1 0 0 /dir1
hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
touchz: The NameSpace quota (directories and files) of directory /dir1 is 
exceeded: quota=2 file count=3{code}
The issue here, is that the count command takes only files and directories into 
account not the inode references. When trash is enabled, the deletion of files 
inside a directory actually does a rename operation as a result of which an 
inode reference is maintained in the deleted list of the snapshot diff which is 
taken into account while computing the namespace quota, but count command 
(getContentSummary()) ,just takes into account just the files and directories, 
not the referenced entity for calculating the REM_QUOTA. The referenced entity 
is taken into account for space quota only.

InodeReference.java:
---
{code:java}
 @Override
public final ContentSummaryComputationContext computeContentSummary(
int snapshotId, ContentSummaryComputationContext summary) {
  final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
  // only count storagespace for WithName
  final QuotaCounts q = computeQuotaUsage(
  summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, s);
  summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
  summary.getCounts().addTypeSpaces(q.getTypeSpaces());
  return summary;
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode

2019-05-16 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1531.
---
Resolution: Fixed

> Disable the sync flag by default during chunk writes in Datanode
> 
>
> Key: HDDS-1531
> URL: https://issues.apache.org/jira/browse/HDDS-1531
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, by default while doing the chunk writes on datanodes, the sync 
> flag is ON by default. This needs to be turned off by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1509) TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently

2019-05-16 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1509:
--
Status: Patch Available  (was: Open)

> TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently
> 
>
> Key: HDDS-1509
> URL: https://issues.apache.org/jira/browse/HDDS-1509
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test fails because, the test expects a exception after 2 datanodes 
> failures to be of type RaftRetryFailureException. But it might happen that, 
> the pipeline gets destroyed quickly then actual write executes over Ratis, 
> hence it will fail with GroupMismatchhException in such case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work stopped] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode

2019-05-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-1531 stopped by Shashikant Banerjee.
-
> Disable the sync flag by default during chunk writes in Datanode
> 
>
> Key: HDDS-1531
> URL: https://issues.apache.org/jira/browse/HDDS-1531
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, by default while doing the chunk writes on datanodes, the sync 
> flag is ON by default. This needs to be turned off by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode

2019-05-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-1531 started by Shashikant Banerjee.
-
> Disable the sync flag by default during chunk writes in Datanode
> 
>
> Key: HDDS-1531
> URL: https://issues.apache.org/jira/browse/HDDS-1531
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, by default while doing the chunk writes on datanodes, the sync 
> flag is ON by default. This needs to be turned off by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1531) Disable the sync flag by default during chunk writes in Datanode

2019-05-14 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1531:
-

 Summary: Disable the sync flag by default during chunk writes in 
Datanode
 Key: HDDS-1531
 URL: https://issues.apache.org/jira/browse/HDDS-1531
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, by default while doing the chunk writes on datanodes, the sync flag 
is ON by default. This needs to be turned off by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840044#comment-16840044
 ] 

Shashikant Banerjee commented on HDDS-1517:
---

Patch v0 adds the fix. I will open up a pull request and add a new patch which 
also will add test to verify the fix.

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1517.000.patch
>
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1517:
--
Status: Patch Available  (was: Open)

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1517.000.patch
>
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1517:
--
Attachment: HDDS-1517.000.patch

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1517.000.patch
>
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-10 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1517:
--
Description: In allocateContainer call,  the container is first added to 
pipelineStateMap and then added to container cache. If two allocate blocks 
execute concurrently, it might happen that one find the container to exist in 
the pipelineStateMap but the container is yet to be updated in the container 
cache, hence failing with CONTAINER_NOT_FOUND exception.

> AllocateBlock call fails with ContainerNotFoundException
> 
>
> Key: HDDS-1517
> URL: https://issues.apache.org/jira/browse/HDDS-1517
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> In allocateContainer call,  the container is first added to pipelineStateMap 
> and then added to container cache. If two allocate blocks execute 
> concurrently, it might happen that one find the container to exist in the 
> pipelineStateMap but the container is yet to be updated in the container 
> cache, hence failing with CONTAINER_NOT_FOUND exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1517) AllocateBlock call fails with ContainerNotFoundException

2019-05-10 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1517:
-

 Summary: AllocateBlock call fails with ContainerNotFoundException
 Key: HDDS-1517
 URL: https://issues.apache.org/jira/browse/HDDS-1517
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1509) TestBlockOutputStreamWithFailures#test2DatanodesFailure fails intermittently

2019-05-09 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1509:
-

 Summary: TestBlockOutputStreamWithFailures#test2DatanodesFailure 
fails intermittently
 Key: HDDS-1509
 URL: https://issues.apache.org/jira/browse/HDDS-1509
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


The test fails because, the test expects a exception after 2 datanodes failures 
to be of type RaftRetryFailureException. But it might happen that, the pipeline 
gets destroyed quickly then actual write executes over Ratis, hence it will 
fail with GroupMismatchhException in such case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1502) Add metrics for Ozone Ratis performance

2019-05-09 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1502:
-

Assignee: Shashikant Banerjee  (was: Mukul Kumar Singh)

> Add metrics for Ozone Ratis performance
> ---
>
> Key: HDDS-1502
> URL: https://issues.apache.org/jira/browse/HDDS-1502
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>
> This jira will add some metrics for Ratis pipeline performance
> a) number of chunks written per seconds
> b) number of bytes written per second
> c) number of chunk/bytes missed during read State Machine data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1504) Watch Request should use retry policy with higher timeouts for RaftClient

2019-05-08 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1504:
-

 Summary: Watch Request should use retry policy with higher 
timeouts for RaftClient
 Key: HDDS-1504
 URL: https://issues.apache.org/jira/browse/HDDS-1504
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, Raft Client request times out with default of 3s but, watch request 
can have longer timeouts as some followers can be really slow. It would be good 
to enforce a retry policy with higher timeouts while submitting watch request 
over raft client in ozone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1362) Append all chunk writes for a block to a single file in datanode

2019-05-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1362.
---
Resolution: Duplicate

> Append all chunk writes for a block to a single file in datanode
> 
>
> Key: HDDS-1362
> URL: https://issues.apache.org/jira/browse/HDDS-1362
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, for each chunk, data is written to individual chunk files. The 
> idea here is to maintain one file per block in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1437) TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails with assertion error

2019-05-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1437.
---
  Resolution: Fixed
   Fix Version/s: 0.5.0
Target Version/s: 0.5.0  (was: 0.4.0)

This should have been addressed with HDDS-1395. Its not reproducible in latest 
runs . Resolving it for now.

> TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails 
> with assertion error
> --
>
> Key: HDDS-1437
> URL: https://issues.apache.org/jira/browse/HDDS-1437
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> The test is failing with the following assertion
> {code}
> java.lang.AssertionError: expected:<2> but was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:373)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> https://ci.anzix.net//job/ozone-nightly/62//testReport/junit/org.apache.hadoop.ozone.client.rpc/TestBlockOutputStreamWithFailures/testWatchForCommitDatanodeFailure/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1485) Ozone writes fail when single threaded client writes 100MB files repeatedly.

2019-05-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1485:
-

Assignee: Shashikant Banerjee

> Ozone writes fail when single threaded client writes 100MB files repeatedly. 
> -
>
> Key: HDDS-1485
> URL: https://issues.apache.org/jira/browse/HDDS-1485
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Aravindan Vijayan
>Assignee: Shashikant Banerjee
>Priority: Blocker
>
> *Environment*
> 26 node physical cluster.
> All Datanodes are up and running.
> Client attempting to write 1600 x 100MB files using the FsStress utility 
> (https://github.com/arp7/FsPerfTest) fails with the following error. 
> {code}
> 19/05/02 09:58:49 ERROR storage.BlockOutputStream: Unexpected Storage 
> Container Exception:
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 424 does not exist
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:573)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.validateResponse(BlockOutputStream.java:539)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$2(BlockOutputStream.java:616)
> at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It looks like a corruption in the container metadata. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException

2019-05-08 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1436.
---
   Resolution: Fixed
Fix Version/s: 0.5.0

> TestCommitWatcher#testReleaseBuffersOnException fails with 
> IllegalStateException
> 
>
> Key: HDDS-1436
> URL: https://issues.apache.org/jira/browse/HDDS-1436
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone-flaky-test
> Fix For: 0.5.0
>
>
> the test is failing with the following exception
> {code}
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception

2019-05-07 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1395:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

Thanks [~jnp] for the review. I have committed this change to trunk.

> Key write fails with BlockOutputStream has been closed exception
> 
>
> Key: HDDS-1395
> URL: https://issues.apache.org/jira/browse/HDDS-1395
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, 
> HDDS-1395.003.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Key write fails with BlockOutputStream has been closed
> {code}
> 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator 
> (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create 
> key:pool-431-thread-9-2092651262 failed with exception, but skipping
> java.io.IOException: BlockOutputStream has been closed.
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245)
> at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287)
> at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1224) Restructure code to validate the response from server in the Read path

2019-05-07 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834575#comment-16834575
 ] 

Shashikant Banerjee commented on HDDS-1224:
---

Attached v0 patch for initial review. Will generate a pull request soon.

> Restructure code to validate the response from server in the Read path
> --
>
> Key: HDDS-1224
> URL: https://issues.apache.org/jira/browse/HDDS-1224
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1224.000.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the read path, the validation of the response while reading the data from 
> the datanodes happen in XceiverClientGrpc as well as additional  Checksum 
> verification happens in Ozone client to verify the read chunk response. The 
> aim of this Jira is to modify the function call to take a validator function 
> as a part of reading data so all validation can happen in a single unified 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path

2019-05-07 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1224:
--
Attachment: HDDS-1224.000.patch

> Restructure code to validate the response from server in the Read path
> --
>
> Key: HDDS-1224
> URL: https://issues.apache.org/jira/browse/HDDS-1224
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.0
>
> Attachments: HDDS-1224.000.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the read path, the validation of the response while reading the data from 
> the datanodes happen in XceiverClientGrpc as well as additional  Checksum 
> verification happens in Ozone client to verify the read chunk response. The 
> aim of this Jira is to modify the function call to take a validator function 
> as a part of reading data so all validation can happen in a single unified 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path

2019-05-07 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1224:
--
Status: Patch Available  (was: Open)

> Restructure code to validate the response from server in the Read path
> --
>
> Key: HDDS-1224
> URL: https://issues.apache.org/jira/browse/HDDS-1224
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.0
>
> Attachments: HDDS-1224.000.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the read path, the validation of the response while reading the data from 
> the datanodes happen in XceiverClientGrpc as well as additional  Checksum 
> verification happens in Ozone client to verify the read chunk response. The 
> aim of this Jira is to modify the function call to take a validator function 
> as a part of reading data so all validation can happen in a single unified 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1224) Restructure code to validate the response from server in the Read path

2019-05-07 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1224:
--
Fix Version/s: (was: 0.4.0)
   0.5.0

> Restructure code to validate the response from server in the Read path
> --
>
> Key: HDDS-1224
> URL: https://issues.apache.org/jira/browse/HDDS-1224
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1224.000.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the read path, the validation of the response while reading the data from 
> the datanodes happen in XceiverClientGrpc as well as additional  Checksum 
> verification happens in Ozone client to verify the read chunk response. The 
> aim of this Jira is to modify the function call to take a validator function 
> as a part of reading data so all validation can happen in a single unified 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1437) TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails with assertion error

2019-05-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1437:
-

Assignee: Shashikant Banerjee

> TestBlockOutputStreamWithFailures#testWatchForCommitDatanodeFailure fails 
> with assertion error
> --
>
> Key: HDDS-1437
> URL: https://issues.apache.org/jira/browse/HDDS-1437
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>
> The test is failing with the following assertion
> {code}
> java.lang.AssertionError: expected:<2> but was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:373)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> https://ci.anzix.net//job/ozone-nightly/62//testReport/junit/org.apache.hadoop.ozone.client.rpc/TestBlockOutputStreamWithFailures/testWatchForCommitDatanodeFailure/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception

2019-05-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1395:
--
Status: Open  (was: Patch Available)

> Key write fails with BlockOutputStream has been closed exception
> 
>
> Key: HDDS-1395
> URL: https://issues.apache.org/jira/browse/HDDS-1395
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, 
> HDDS-1395.003.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Key write fails with BlockOutputStream has been closed
> {code}
> 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator 
> (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create 
> key:pool-431-thread-9-2092651262 failed with exception, but skipping
> java.io.IOException: BlockOutputStream has been closed.
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245)
> at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287)
> at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception

2019-05-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1395:
--
Status: Patch Available  (was: Open)

> Key write fails with BlockOutputStream has been closed exception
> 
>
> Key: HDDS-1395
> URL: https://issues.apache.org/jira/browse/HDDS-1395
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, 
> HDDS-1395.003.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Key write fails with BlockOutputStream has been closed
> {code}
> 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator 
> (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create 
> key:pool-431-thread-9-2092651262 failed with exception, but skipping
> java.io.IOException: BlockOutputStream has been closed.
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245)
> at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287)
> at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception

2019-05-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1395:
--
Attachment: HDDS-1395.003.patch

> Key write fails with BlockOutputStream has been closed exception
> 
>
> Key: HDDS-1395
> URL: https://issues.apache.org/jira/browse/HDDS-1395
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch, 
> HDDS-1395.003.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Key write fails with BlockOutputStream has been closed
> {code}
> 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator 
> (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create 
> key:pool-431-thread-9-2092651262 failed with exception, but skipping
> java.io.IOException: BlockOutputStream has been closed.
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245)
> at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287)
> at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1484) Add unit tests for writing concurrently on different type of pipelines by multiple threads

2019-05-02 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1484:
-

 Summary: Add unit tests for writing concurrently on different type 
of pipelines by multiple threads
 Key: HDDS-1484
 URL: https://issues.apache.org/jira/browse/HDDS-1484
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Shashikant Banerjee


This Jira aims to add unit tests for writing concurrently in single as well as 
3 node pipelines with different sized data using multiple threads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit

2019-04-30 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1282.
---
   Resolution: Fixed
Fix Version/s: 0.5.0

As [~elek] explained, this issue does not exist any more and the other issue is 
tracked by HDDS-1384. Resolving this.

> TestFailureHandlingByClient causes a jvm exit
> -
>
> Key: HDDS-1282
> URL: https://issues.apache.org/jira/browse/HDDS-1282
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1282.001.patch, 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient-output.txt
>
>
> The test causes jvm exit because the test exits prematurely.
> {code}
> [ERROR] org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd 
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test && 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java 
> -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar 
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire/surefirebooter5405606309417840457.jar
>  
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/target/surefire
>  2019-03-13T23-31-09_018-jvmRun1 surefire5934599060460829594tmp 
> surefire_1202723709650989744795tmp
> [ERROR] Error occurred in starting fork, check output in log
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1452) All chunk writes should happen to a single file for a block in datanode

2019-04-24 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1452:
--
Summary: All chunk writes should happen to a single file for a block in 
datanode  (was: All chunks should happen to a single file for a block in 
datanode)

> All chunk writes should happen to a single file for a block in datanode
> ---
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1449) JVM Exit in datanode while committing a key

2019-04-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1449:
-

Assignee: Shashikant Banerjee

> JVM Exit in datanode while committing a key
> ---
>
> Key: HDDS-1449
> URL: https://issues.apache.org/jira/browse/HDDS-1449
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, 
> hs_err_pid67466.log
>
>
> Saw the following trace in MiniOzoneChaosCluster run.
> {code}
> C  [librocksdbjni17271331491728127.jnilib+0x9755c]  
> Java_org_rocksdb_RocksDB_write0+0x1c
> J 13917  org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e 
> [0x0001102ff580+0xae]
> J 17167 C2 
> org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V
>  (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c]
> J 20434 C1 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J
>  (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c]
> J 19262 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540]
> J 15095 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880]
> J 19301 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4]
> J 15997 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object;
>  (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4]
> J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 
> bytes) @ 0x00010fc80094 [0x00010fc8+0x94]
> J 17368 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200]
> J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x00011012a004 [0x000110129f00+0x104]
> J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 
> [0x00011002b000+0x144]
> v  ~StubRoutines::call_stub
> V  [libjvm.dylib+0x2ef1f6]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x6ae
> V  [libjvm.dylib+0x2ef99a]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V  [libjvm.dylib+0x2efb46]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V  [libjvm.dylib+0x34a46d]  thread_entry(JavaThread*, Thread*)+0x7c
> V  [libjvm.dylib+0x56eb0f]  JavaThread::thread_main_inner()+0x9b
> V  [libjvm.dylib+0x57020a]  JavaThread::run()+0x1c2
> V  [libjvm.dylib+0x48d4a6]  java_start(Thread*)+0xf6
> C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
> C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
> C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd
> C  0x
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1452:
-

 Summary: All chunks should happen to a single file for a block in 
datanode
 Key: HDDS-1452
 URL: https://issues.apache.org/jira/browse/HDDS-1452
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, all chunks of a block happen to individual chunk files in datanode. 
This idea here is to write all individual chunks to a single file in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1445) Add handling of NotReplicatedException in OzoneClient

2019-04-17 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819910#comment-16819910
 ] 

Shashikant Banerjee commented on HDDS-1445:
---

This will be handled as a part of HDDS-1395.

> Add handling of NotReplicatedException in OzoneClient
> -
>
> Key: HDDS-1445
> URL: https://issues.apache.org/jira/browse/HDDS-1445
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>
> In MiniOzoneChaosCluster some of the calls fail with NotReplicatedException. 
> This Exception needs to be handled in OzoneClient
> {code}
> 2019-04-17 10:13:47,254 INFO  client.GrpcClientProtocolService 
> (GrpcClientProtocolService.java:lambda$processClientRequest$0(264)) - Failed 
> RaftClientRequest:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC,
>  cid=800, seq=0, Watch-ALL_COMMITTED(234), Message:, 
> reply=RaftClientReply:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC,
>  cid=800, FAILED org.apache.ratis.protocol.NotReplicatedException: Request 
> with call Id 800 and log index 234 is not yet replicated to ALL_COMMITTED, 
> logIndex=234, commits[1ebec547-8cf8-4466-bf43-ea9f19fb546b:c267, 
> 7b200ef5-7711-437d-a9bc-ad0e18fdf6bb:c267, 
> ffbfb65f-a622-466d-b6e8-47038cc15e0b:c226]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1395) Key write fails with BlockOutputStream has been closed exception

2019-04-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1395:
--
Summary: Key write fails with BlockOutputStream has been closed exception  
(was: Key write fails with "BlockOutputStream has been closed")

> Key write fails with BlockOutputStream has been closed exception
> 
>
> Key: HDDS-1395
> URL: https://issues.apache.org/jira/browse/HDDS-1395
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: HDDS-1395.000.patch, HDDS-1395.001.patch
>
>
> Key write fails with BlockOutputStream has been closed
> {code}
> 2019-04-05 11:24:47,770 ERROR ozone.MiniOzoneLoadGenerator 
> (MiniOzoneLoadGenerator.java:load(102)) - LOADGEN: Create 
> key:pool-431-thread-9-2092651262 failed with exception, but skipping
> java.io.IOException: BlockOutputStream has been closed.
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.checkOpen(BlockOutputStream.java:662)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:245)
> at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:131)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:325)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287)
> at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:100)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:143)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1445) Add handling of NotReplicatedException in OzoneClient

2019-04-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1445:
-

Assignee: Shashikant Banerjee

> Add handling of NotReplicatedException in OzoneClient
> -
>
> Key: HDDS-1445
> URL: https://issues.apache.org/jira/browse/HDDS-1445
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>
> In MiniOzoneChaosCluster some of the calls fail with NotReplicatedException. 
> This Exception needs to be handled in OzoneClient
> {code}
> 2019-04-17 10:13:47,254 INFO  client.GrpcClientProtocolService 
> (GrpcClientProtocolService.java:lambda$processClientRequest$0(264)) - Failed 
> RaftClientRequest:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC,
>  cid=800, seq=0, Watch-ALL_COMMITTED(234), Message:, 
> reply=RaftClientReply:client-43B95E0E3BE0->1ebec547-8cf8-4466-bf43-ea9f19fb546b@group-1B28E0BF6CBC,
>  cid=800, FAILED org.apache.ratis.protocol.NotReplicatedException: Request 
> with call Id 800 and log index 234 is not yet replicated to ALL_COMMITTED, 
> logIndex=234, commits[1ebec547-8cf8-4466-bf43-ea9f19fb546b:c267, 
> 7b200ef5-7711-437d-a9bc-ad0e18fdf6bb:c267, 
> ffbfb65f-a622-466d-b6e8-47038cc15e0b:c226]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1373) KeyOutputStream, close after write request fails after retries, runs into IllegalArgumentException

2019-04-17 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1373:
--
   Resolution: Fixed
Fix Version/s: 0.5.0
   Status: Resolved  (was: Patch Available)

Thanks [~msingh] and [~jnp] for the review. I have committed this change to 
trunk.

> KeyOutputStream, close after write request fails after retries, runs into 
> IllegalArgumentException
> --
>
> Key: HDDS-1373
> URL: https://issues.apache.org/jira/browse/HDDS-1373
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0, 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1373.000.patch, HDDS-1373.001.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In this code, the stream is closed via try with resource.
> {code}
>   try (OzoneOutputStream stream = ozoneBucket.createKey(keyName,
>   bufferCapacity, ReplicationType.RATIS, ReplicationFactor.THREE,
>   new HashMap<>())) {
> stream.write(buffer.array());
>   } catch (Exception e) {
> LOG.error("LOADGEN: Create key:{} failed with exception", keyName, e);
> break;
>   }
> {code}
> Here, the write call fails correctly as expected, However the close doesn't 
> fail with the same exception.
> The exception stack stack is as following
> {code}
> 2019-04-03 00:52:54,116 ERROR ozone.MiniOzoneLoadGenerator 
> (MiniOzoneLoadGenerator.java:load(101)) - LOADGEN: Create 
> key:pool-431-thread-9-8126 failed with exception
> java.io.IOException: Retry request failed. retries get failed due to exceeded 
> maximum allowed retries number: 5
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:492)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:514)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:468)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:344)
> at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:287)
> at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
> at java.io.OutputStream.write(OutputStream.java:75)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.load(MiniOzoneLoadGenerator.java:99)
> at 
> org.apache.hadoop.ozone.MiniOzoneLoadGenerator.lambda$startIO$0(MiniOzoneLoadGenerator.java:137)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.lang.IllegalArgumentException
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> at 
> 

[jira] [Updated] (HDDS-1380) Add functonality to write from multiple clients in MiniOzoneChaosCluster

2019-04-16 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1380:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~msingh] for the review. i have committed this change to trunk.

> Add functonality to write from multiple clients in MiniOzoneChaosCluster
> 
>
> Key: HDDS-1380
> URL: https://issues.apache.org/jira/browse/HDDS-1380
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1380.000.patch
>
>
> Currently, MiniOzoneChaosCluster writes multiple keys in parallel using only 
> one OzoneClient instance. This jira aims to add functionality to write 
> multiple keys with multiple ozone client instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException

2019-04-16 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819000#comment-16819000
 ] 

Shashikant Banerjee edited comment on HDDS-1436 at 4/16/19 1:18 PM:


This has been addressed with HDDS-1395. The test fails because of a 
Precondition check in CommitWatcher#watchForCommit which assumes that the 
commitIndex2FlushDataMap should not be empty once this function is called, but 
in tests , it executes  a watchForCommit on a putBliock logIndex out of 2 
putBlocks calls being made and wait for it to complete. It is possible that, 
while waiting for first putBlock, the 2nd putBlock also completes and ends up 
cleaning up the commitIndex2FlushedData Map. Hence, while calling 
watchForCommit on the next index, it might be possible to have 
commitIndex2FlushDataMap empty. The fix is to remove the precondition check 
while calling watchForCommit.


was (Author: shashikant):
This has been addressed with HDDS-1395.

> TestCommitWatcher#testReleaseBuffersOnException fails with 
> IllegalStateException
> 
>
> Key: HDDS-1436
> URL: https://issues.apache.org/jira/browse/HDDS-1436
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone-flaky-test
>
> the test is failing with the following exception
> {code}
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException

2019-04-16 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819000#comment-16819000
 ] 

Shashikant Banerjee commented on HDDS-1436:
---

This has been addressed with HDDS-1395.

> TestCommitWatcher#testReleaseBuffersOnException fails with 
> IllegalStateException
> 
>
> Key: HDDS-1436
> URL: https://issues.apache.org/jira/browse/HDDS-1436
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone-flaky-test
>
> the test is failing with the following exception
> {code}
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1436) TestCommitWatcher#testReleaseBuffersOnException fails with IllegalStateException

2019-04-15 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1436:
-

Assignee: Shashikant Banerjee

> TestCommitWatcher#testReleaseBuffersOnException fails with 
> IllegalStateException
> 
>
> Key: HDDS-1436
> URL: https://issues.apache.org/jira/browse/HDDS-1436
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone-flaky-test
>
> the test is failing with the following exception
> {code}
> java.lang.IllegalStateException
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>   at 
> org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:191)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:277)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> https://ci.anzix.net/job/ozone-nightly/63/testReport/org.apache.hadoop.ozone.client.rpc/TestCommitWatcher/testReleaseBuffersOnException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit

2019-04-12 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816277#comment-16816277
 ] 

Shashikant Banerjee edited comment on HDDS-1282 at 4/12/19 1:48 PM:


Thanks [~elek], In the latest code, the test fails because of datanode crash 
when the miniOzoneCluster startup.
{code:java}
2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis 
(XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 
3ab53731-d087-494c-9378-ee35abffb271 at port 53578
2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService 
(HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 
ip:192.168.0.64
2019-04-12 19:13:26,600 INFO impl.RaftServerProxy 
(RaftServerProxy.java:lambda$start$3(299)) - 
3ab53731-d087-494c-9378-ee35abffb271: start RPC server
2019-04-12 19:13:26,605 ERROR server.GrpcService 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start Grpc server
java.io.IOException: Failed to bind
at 
org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at 
org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at 
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at 
org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at 
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code}
The issue is happening because, we use random ports for datanodes in 
MIniOzoneCluster, where we try to find a free port during set up, but the Ratis 
server starts at a a later time . In the meantime, if some other datanode picks 
up the same port, the datanode crash.

The patch does not address this issue and is outdated.


was (Author: shashikant):
Thanks[~elek], In the latest code, the test fails because of datanode crash 
when the miniOzoneCluster startup.
{code:java}
2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis 
(XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 
3ab53731-d087-494c-9378-ee35abffb271 at port 53578
2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService 
(HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 
ip:192.168.0.64
2019-04-12 19:13:26,600 INFO impl.RaftServerProxy 
(RaftServerProxy.java:lambda$start$3(299)) - 

[jira] [Commented] (HDDS-1282) TestFailureHandlingByClient causes a jvm exit

2019-04-12 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816277#comment-16816277
 ] 

Shashikant Banerjee commented on HDDS-1282:
---

Thanks[~elek], In the latest code, the test fails because of datanode crash 
when the miniOzoneCluster startup.
{code:java}
2019-04-12 19:13:26,593 INFO ratis.XceiverServerRatis 
(XceiverServerRatis.java:start(416)) - Starting XceiverServerRatis 
3ab53731-d087-494c-9378-ee35abffb271 at port 53578
2019-04-12 19:13:26,593 INFO ozone.HddsDatanodeService 
(HddsDatanodeService.java:start(174)) - HddsDatanodeService host:192.168.0.64 
ip:192.168.0.64
2019-04-12 19:13:26,600 INFO impl.RaftServerProxy 
(RaftServerProxy.java:lambda$start$3(299)) - 
3ab53731-d087-494c-9378-ee35abffb271: start RPC server
2019-04-12 19:13:26,605 ERROR server.GrpcService 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start Grpc server
java.io.IOException: Failed to bind
at 
org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at 
org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at 
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
at 
org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at 
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){code}
The issue is happening because, we use random ports for datanodes in 
MIniOzoneCluster, where we try to find a free port during set up, but the Ratis 
server starts at a a later time . In the meantime, if some other datanode picks 
up the same port, the datanode crash.

> TestFailureHandlingByClient causes a jvm exit
> -
>
> Key: HDDS-1282
> URL: https://issues.apache.org/jira/browse/HDDS-1282
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-1282.001.patch, 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient-output.txt
>
>
> The test causes jvm exit because the test exits prematurely.
> {code}
> [ERROR] 

<    1   2   3   4   5   6   7   8   9   10   >