[jira] [Updated] (HDDS-3022) Datanode unable to close Pipeline after disk out of space

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3022:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanode unable to close Pipeline after disk out of space
> -
>
> Key: HDDS-3022
> URL: https://issues.apache.org/jira/browse/HDDS-3022
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: TriagePending
> Attachments: ozone_logs.zip
>
>
> Datanode gets into a loop and keeps throwing errors while trying to close 
> pipeline
> {code:java}
> 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from  
> FOLLOWER to CANDIDATE at term 6240 for changeToCandidate
> 2020-02-14 00:25:10,208 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=02e7e10e-2d50-4ace-a18b-701265ec9f07.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 is in candidate state for 31898494ms
> 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start LeaderElection
> 2020-02-14 00:25:10,223 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: 
> begin an election at term 6241 for 0: 
> [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,259 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: 
> Election REJECTED; received 0 response(s) [] and 2 exception(s); 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07:t6241, leader=null, 
> voted=285cac09-7622-45e6-be02-b3c68ebf8b10, 
> raftlog=285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-SegmentedRaftLog:OPENED:c4,f4,i14,
>  conf=0: [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 0: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from 
> CANDIDATE to FOLLOWER at term 6241 for DISCOVERED_A_NEW_TERM
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown LeaderElection
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start FollowerState
> 2020-02-14 00:25:10,680 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-DD847EC75388->d432c890-5ec4-4cf1-9078-28497a08ab85-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12669,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:10,752 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=7ad5ce51-d3fa-4e71-99f2-dd847ec75388.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> 

[jira] [Updated] (HDDS-2476) Share more code between metadata and data scanners

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2476:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Share more code between metadata and data scanners
> --
>
> Key: HDDS-2476
> URL: https://issues.apache.org/jira/browse/HDDS-2476
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: YiSheng Lien
>Priority: Major
>
> There are several duplicated / similar pieces of code in metadata and data 
> scanners.  More code should be reused.
> Examples:
> # ContainerDataScrubberMetrics and ContainerMetadataScrubberMetrics have 3 
> common metrics
> # lifecycle of ContainerMetadataScanner and ContainerDataScanner (main loop, 
> iteration, metrics processing, shutdown)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3163) write Key is hung when write delay is injected in datanode dir

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3163.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> write Key is hung when write delay is injected in datanode dir
> --
>
> Key: HDDS-3163
> URL: https://issues.apache.org/jira/browse/HDDS-3163
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nilotpal Nandi
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending, fault_injection
> Fix For: 0.6.0
>
>
> steps taken :
> -
> 1. Mounted noise injection FUSE on all datanodes.
> 2. Select one datanode from each open pipeline
> 3. Inject delay of 120 seconds on chunk file path of selected datanodes
> 4. Start PUT key operation.
> PUT Key operation is stuck and does not return any success/error .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3498) Address already in use Should shutdown the datanode with FATAL log and point out the port and configure key

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3498:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Address already in use Should shutdown the datanode with FATAL log and point 
> out the port and configure key
> ---
>
> Key: HDDS-3498
> URL: https://issues.apache.org/jira/browse/HDDS-3498
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.6.0
>Reporter: maobaolong
>Priority: Minor
>  Labels: Triaged
>
> Now, the datanode process cannot work because the port is in use, but the 
> process still live.
> Further more, i guess the in use port is 9861, but it isn't, after look the 
> source code, i find it is the `dfs.container.ipc`, default port is 9859, this 
> port should appear with the following exception. I think this error should be 
> in FATAL level, and we can terminate the datanode process.
> {code:java}
> 2020-04-21 15:53:05,436 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.EndpointStateMachine: 
> Unable to communicate to SCM server at 127.0.0.1:9861 for past 300 seconds.
> java.io.IOException: Failed to bind
> at 
> org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:246)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:184)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:90)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.XceiverServerGrpc.start(XceiverServerGrpc.java:141)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:235)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113)
> at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1345)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:984)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
> at 
> org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
> at 
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> at 
> org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> ... 1 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Updated] (HDDS-2696) Document recovery from RATIS-677

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2696:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Document recovery from RATIS-677
> 
>
> Key: HDDS-2696
> URL: https://issues.apache.org/jira/browse/HDDS-2696
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Istvan Fajth
>Priority: Critical
>  Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set 
> for the RatisServer implementation to ignore the corruption, and at the 
> moment due to HDDS-2647, we do not have a clear recovery path from a ratis 
> corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes 
> closing the pipeline in SCM and remove the ratis metadata for the pipeline in 
> the DataNodes, which effectively clears out the corrupted pipeline from the 
> system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened 
> properly if the ratis metadata could become corrupt so this needs to be 
> investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop 
> the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, 
> then restart the DNs) the pipeline is not closed properly, and SCM fails as 
> described in HDDS-2695



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3514) Fix Memory leak of RaftServerImpl

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-3514.
---
Fix Version/s: 0.6.0
   Resolution: Fixed

> Fix Memory leak of RaftServerImpl
> -
>
> Key: HDDS-3514
> URL: https://issues.apache.org/jira/browse/HDDS-3514
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: Triaged, pull-request-available
> Fix For: 0.6.0
>
>
> This depends on [RATIS-845|https://issues.apache.org/jira/browse/RATIS-845], 
> find the details in RATIS-845.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2701) Avoid read from temporary chunk file in datanode

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2701:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Avoid read from temporary chunk file in datanode
> 
>
> Key: HDDS-2701
> URL: https://issues.apache.org/jira/browse/HDDS-2701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending
>
> Currently we try reading chunk data from the temp file if chunk file does not 
> exist. The fix was added in HDDS-2372 due to race condition between 
> readStateMachineData and writeStateMachineData in ContainerStateMachine. 
> After HDDS-2542 is fixed the read from the temp file can be avoided by making 
> sure that chunk data remains in cache until the chunk file is generated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1560) RejectedExecutionException on datanode after shutting it down

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1560:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> RejectedExecutionException on datanode after shutting it down
> -
>
> Key: HDDS-1560
> URL: https://issues.apache.org/jira/browse/HDDS-1560
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster, Triaged
>
> RejectedExecutionException on datanode after shutting it down
> {code}
> 2019-05-20 00:38:52,757 ERROR statemachine.DatanodeStateMachine 
> (DatanodeStateMachine.java:start(199)) - Unable to finish the execution.
> java.util.concurrent.RejectedExecutionException: Task 
> java.util.concurrent.ExecutorCompletionService$QueueingFuture@74b926e9 
> rejected from 
> org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@15e1f6
> 9d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed 
> tasks = 90]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> at 
> java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181)
> at 
> org.apache.hadoop.ozone.container.common.states.datanode.RunningDatanodeState.execute(RunningDatanodeState.java:90)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext.execute(StateContext.java:375)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:186)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:349)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3227) Ensure eviction of stateMachineData from cache only when both followers catch up

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3227:
--
Fix Version/s: (was: 0.6.0)

> Ensure eviction of stateMachineData from cache only when both followers catch 
> up
> 
>
> Key: HDDS-3227
> URL: https://issues.apache.org/jira/browse/HDDS-3227
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged
>
> Currently, the data in the StateMachineCache is evicted as soon as the 
> applyTransaction call is issued for a transaction in Ratis. In our testing 
> with keys in few kbs of size, it was figured that the data is evicted from 
> the cache before append requests can be processed in a slightly slow follower 
> thereby making leader read the chunk data from underlying fs/disk very 
> frequently. This leads to slowing down the leader as well as well as overall 
> throughput of the pipeline. 
> The idea here is to ensure the data is evicted from the cache only when both 
> followers have caught up with the match index. If a follower is really slow, 
> it will eventually be marked slow after nodeFailureTimeout and pipeline will 
> be destroyed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3018) Fix TestContainerStateMachineFailures.java

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3018:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Fix TestContainerStateMachineFailures.java
> --
>
> Key: HDDS-3018
> URL: https://issues.apache.org/jira/browse/HDDS-3018
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Attachments: 482610870.zip, 494640486.zip, 
> crashed-org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures-output.txt,
>  
> failure-org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures-output.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The unit tests are written withe single node ratis into consideration. The 
> expectation is the datanode fails, client should see an exception after next 
> io as there is no new dn for new pipeline to form which is not happening as 
> the cluster is created with multiple dns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3227) Ensure eviction of stateMachineData from cache only when both followers catch up

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3227:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Ensure eviction of stateMachineData from cache only when both followers catch 
> up
> 
>
> Key: HDDS-3227
> URL: https://issues.apache.org/jira/browse/HDDS-3227
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged
> Fix For: 0.6.0
>
>
> Currently, the data in the StateMachineCache is evicted as soon as the 
> applyTransaction call is issued for a transaction in Ratis. In our testing 
> with keys in few kbs of size, it was figured that the data is evicted from 
> the cache before append requests can be processed in a slightly slow follower 
> thereby making leader read the chunk data from underlying fs/disk very 
> frequently. This leads to slowing down the leader as well as well as overall 
> throughput of the pipeline. 
> The idea here is to ensure the data is evicted from the cache only when both 
> followers have caught up with the match index. If a follower is really slow, 
> it will eventually be marked slow after nodeFailureTimeout and pipeline will 
> be destroyed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3136) retry timeout is large while writing key

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3136:
--
Parent: HDDS-3350
Issue Type: Sub-task  (was: Bug)

> retry timeout is large while writing key
> 
>
> Key: HDDS-3136
> URL: https://issues.apache.org/jira/browse/HDDS-3136
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: TriagePending, fault_injection
>
> Steps :
>  # Mounted noise injection FUSE on all datanodes.
>  # Injected WRITE delay of 5 seconds on one of the datanodes from each open 
> pipeline
>  # Write a key of 180 MB
> Write operation took more than 10 minutes to complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3594) ManagedChannels are leaked in XceiverClientGrpc manager

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3594:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> ManagedChannels are leaked in XceiverClientGrpc manager
> ---
>
> Key: HDDS-3594
> URL: https://issues.apache.org/jira/browse/HDDS-3594
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.6.0
>Reporter: Rakesh Radhakrishnan
>Priority: Major
>  Labels: TriagePending
>
> XceiverClientGrpc#ManagedChannels are leaked when running {{Hadoop Synthetic 
> Load Generator}} pointing to OzoneFS.
> *Stacktrace:*
> {code:java}
> SEVERE: *~*~*~ Channel ManagedChannelImpl{logId=99, target=10.17.248.31:9859} 
> was not shutdown properly!!! ~*~*~*
> Make sure to call shutdown()/shutdownNow() and wait until 
> awaitTermination() returns true.
> java.lang.RuntimeException: ManagedChannel allocation site
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.(ManagedChannelOrphanWrapper.java:94)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:52)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.(ManagedChannelOrphanWrapper.java:43)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:518)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.connectToDatanode(XceiverClientGrpc.java:191)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.connect(XceiverClientGrpc.java:140)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientManager$2.call(XceiverClientManager.java:244)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientManager$2.call(XceiverClientManager.java:228)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:3951)
> at 
> org.apache.hadoop.ozone.shaded.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4871)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.getClient(XceiverClientManager.java:228)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:174)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:164)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:184)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:133)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:254)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:199)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:63)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator$DFSClientThread.read(LoadGenerator.java:284)
> at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator$DFSClientThread.nextOp(LoadGenerator.java:268)
> at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator$DFSClientThread.run(LoadGenerator.java:235)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3600) ManagedChannels leaked on ratis pipeline when there are many connection retries

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3600:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> ManagedChannels leaked on ratis pipeline when there are many connection 
> retries
> ---
>
> Key: HDDS-3600
> URL: https://issues.apache.org/jira/browse/HDDS-3600
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.6.0
>Reporter: Rakesh Radhakrishnan
>Priority: Critical
>  Labels: TriagePending
> Attachments: HeapHistogram-Snapshot-ManagedChannel-Leaked-001.png, 
> outloggenerator-ozonefs-003.log
>
>
> ManagedChannels leaked on ratis pipeline when there are many connection 
> retries
> Observed that too many ManagedChannels opened while running Synthetic Hadoop 
> load generator.
>  Ran benchmark with only one pipeline in the cluster and also ran with only 
> two pipelines in the cluster. 
>  Both the run failed with too many open files and could see many open TCP 
> connections for long time and suspecting channel leaks..
> More details below:
>  *1)* Execute NNloadGenerator
> {code:java}
> [rakeshr@ve1320 loadOutput]$ ps -ef | grep load
> hdfs 362822  1 19 05:24 pts/000:03:16 
> /usr/java/jdk1.8.0_232-cloudera/bin/java -Dproc_jar -Xmx825955249 
> -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true 
> -Dyarn.log.dir=/var/log/hadoop-yarn -Dyarn.log.file=hadoop.log 
> -Dyarn.home.dir=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop/libexec/../../hadoop-yarn
>  -Dyarn.root.logger=INFO,console 
> -Djava.library.path=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop/lib/native
>  -Dhadoop.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=hadoop.log 
> -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/lib/hadoop
>  -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console 
> -Dhadoop.policy.file=hadoop-policy.xml 
> -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar 
> /opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.2982244/jars/hadoop-mapreduce-client-jobclient-3.1.1.7.2.0.0-141-tests.jar
>  NNloadGenerator -root o3fs://bucket2.vol2/
> rakeshr  368739 354174  0 05:41 pts/000:00:00 grep --color=auto load
> {code}
> *2)* Active 9858 TCP connections during the run, which is ratis pipeline 
> default port.
> {code:java}
> [rakeshr@ve1320 loadOutput]$ sudo lsof -a -p 362822 | grep "9858" | wc
>3229   32290  494080
> [rakeshr@ve1320 loadOutput]$ vi tcp_log
> 
> java440633 hdfs 4090u IPv4  271141987   0t0TCP 
> ve1320.halxg.cloudera.com:35190->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java440633 hdfs 4091u IPv4  271127918   0t0TCP 
> ve1320.halxg.cloudera.com:35192->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java440633 hdfs 4092u IPv4  271038583   0t0TCP 
> ve1320.halxg.cloudera.com:59116->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java440633 hdfs 4093u IPv4  271038584   0t0TCP 
> ve1320.halxg.cloudera.com:59118->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> java440633 hdfs 4095u IPv4  271127920   0t0TCP 
> ve1320.halxg.cloudera.com:35196->ve1323.halxg.cloudera.com:9858 (ESTABLISHED)
> [rakeshr@ve1320 loadOutput]$ ^C
>  {code}
> *3)* heapdump shows there are 9571 ManagedChanel objects. Heapdump is quite 
> large and attached snapshot to this jira.
> *4)* Attached output and threadump of the SyntheticLoadGenerator benchmark 
> client process to show the exceptions printed to the console. FYI, this file 
> was quite large and have trimmed few repeated exception traces..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] Simon0806 commented on a change in pull request #1041: HDDS-3725. Ozone sh volume client support quota option

2020-06-09 Thread GitBox


Simon0806 commented on a change in pull request #1041:
URL: https://github.com/apache/hadoop-ozone/pull/1041#discussion_r437873358



##
File path: hadoop-hdds/docs/content/shell/VolumeCommands.md
##
@@ -42,7 +42,7 @@ assign it to a user.
 |  Uri   | The name of the volume. 
   |
 
 {{< highlight bash >}}
-ozone sh volume create --quota=1TB --user=bilbo /hive
+ozone sh volume create -ssq=1TB --user=bilbo /hive

Review comment:
   In design doc, it also provide a full name called ssQuota,  Not sure if 
there are other better names, or we can use 'spaceQuota' instead, but I think 
this can be explained in document. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3611) Ozone client should not consider closed container error as failure

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3611:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Ozone client should not consider closed container error as failure
> --
>
> Key: HDDS-3611
> URL: https://issues.apache.org/jira/browse/HDDS-3611
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Critical
>  Labels: TriagePending
>
> ContainerNotOpen exception exception is thrown by datanode when client is 
> writing to a non open container. Currently ozone client sees this as failure 
> and would increment the retry count. If client reaches a configured retry 
> count it fails the write. Map reduce jobs were seen failing due to this error 
> with default retry count of 5.
> Idea is to not consider errors due to closed container in retry count. This 
> would make sure that ozone client writes do not fail due to closed container 
> exceptions.
> {code:java}
> 2020-05-15 02:20:28,375 ERROR [main] 
> org.apache.hadoop.ozone.client.io.KeyOutputStream: Retry request failed. 
> retries get failed due to exceeded maximum allowed retries number: 5
> java.io.IOException: Unexpected Storage Container Exception: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.StateMachineException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException 
> from Server e2eec12f-02c5-46e2-9c23-14d6445db219@group-A3BF3ABDC307: 
> Container 15 in CLOSED state
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:551)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$3(BlockOutputStream.java:638)
> at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
> at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at 
> org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
> at 
> org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
> at 
> org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:143)
> at 
> org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:314)
> at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:242)
> at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
> at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:284)
> at java.util.Optional.ifPresent(Optional.java:159)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:340)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:264)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:284)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:267)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:436)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658)
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3481) SCM ask too many datanodes to replicate the same container

2020-06-09 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130090#comment-17130090
 ] 

runzhiwang commented on HDDS-3481:
--

bq. Should we consider balancing the replication source among datanodes or 
throttle the replication per datanode?
Yeah, [~nanda] said he will track this,  maybe he will open another jira and 
share a design. [~xyao]

> SCM ask too many datanodes to replicate the same container
> --
>
> Key: HDDS-3481
> URL: https://issues.apache.org/jira/browse/HDDS-3481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Blocker
>  Labels: TriagePending, pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> *What's the problem ?*
> As the image shows,  scm ask 31 datanodes to replicate container 2037 every 
> 10 minutes from 2020-04-17 23:38:51.  And at 2020-04-18 08:58:52 scm find the 
> replicate num of container 2037 is 12, then it ask 11 datanodes to delete 
> container 2037. 
>  !screenshot-1.png! 
>  !screenshot-2.png! 
> *What's the reason ?*
> scm check whether  (container replicates num + 
> inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it 
> will ask some datanode to replicate the container, and add the action into 
> inflightReplication.get(containerId). The replicate action time out is 10 
> minutes, if action timeout, scm will delete the action from 
> inflightReplication.get(containerId) as the image shows. Then (container 
> replicates num + inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask 
> another datanode to replicate the container.
> Because replicate container cost a long time,  sometimes it cannot finish in 
> 10 minutes, thus 31 datanodes has to replicate the container every 10 
> minutes.  19 of 31 datanodes replicate container from the same source 
> datanode,  it will also cause big pressure on the source datanode and 
> replicate container become slower. Actually it cost 4 hours to finish the 
> first replicate. 
>  !screenshot-4.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3701) Change Close container exception logging to debug in ContainerUtils#logAndReturnError

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3701:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Change Close container exception logging to debug in 
> ContainerUtils#logAndReturnError
> -
>
> Key: HDDS-3701
> URL: https://issues.apache.org/jira/browse/HDDS-3701
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Priority: Major
>  Labels: Performance, Triaged
>
> The logging in ContainerUtils should be changed to debug in case of close 
> container exception
> {code:java}
> 2020-06-01 20:39:19,705 INFO 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: 
> PutBlock , Trace ID:  , Message: Requested operation not allowed as 
> ContainerState is CLOSED , Result: CLOSED_CONTAINER_IO , 
> StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Requested operation not allowed as ContainerState is CLOSED
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.checkContainerOpen(KeyValueHandler.java:902)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handlePutBlock(KeyValueHandler.java:417)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:179)
> at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:155)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:297)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:164)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:746)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3702) Consider avoiding stream/map/sum in IO path

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3702:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Consider avoiding stream/map/sum in IO path
> ---
>
> Key: HDDS-3702
> URL: https://issues.apache.org/jira/browse/HDDS-3702
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Priority: Major
>  Labels: Performance, Triaged
>
> In our internal testing, it was figured out that java streams seem to be more 
> cpu intensive than loops thereby degrading performance. The aim of the Jira 
> is to avoid using streams in the IO path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3559) Datanode doesn't handle java heap OutOfMemory exception

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-3559:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanode doesn't handle java heap OutOfMemory exception 
> 
>
> Key: HDDS-3559
> URL: https://issues.apache.org/jira/browse/HDDS-3559
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Priority: Major
>  Labels: Triaged, pull-request-available
>
> 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.Endpoi
> ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 
> for past 0 seconds.
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148)
>         at 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145)
>         at 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: 
> Java heap space
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy38.submitRequest(Unknown Source)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)
>  
> On a cluster, one datanode stops reporting to SCM while being kept unknown. 
> The datanode process is still working. Log shows Java heap OOM when it's 
> serializing protobuf for rpc message. However, datanode silently stops 
> reports to SCM and the process becomes stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1890) Add metrics for data pipeline performance monitoring in ContainerStateMachine

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1890:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Add metrics for data pipeline performance monitoring in ContainerStateMachine
> -
>
> Key: HDDS-1890
> URL: https://issues.apache.org/jira/browse/HDDS-1890
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged
>
> Following metrics have been identified to be incorporated inside 
> containerStateMachine in order to measure and monitor the data pipeline 
> performance in datanode:
>  
> |containerDbAvgLoadLatency|Avg time to load a container db|
> |avgWriteStateMachineDataLatency|Avg time to write stateMachine Data|
> |stateMachineCacheMissRatio|Ratio of stateMachine machine cache misses to 
> total no of readStateMachine calls|
> |avgTransactionalLatency|Avg transactional time for a client request inside 
> the stateMachine(i.e, time from startTransaction call till applyTransaction 
> of the request completion)|
> |avgContainerDbReadLatency|Avg Rocks Db write latency during putBlock|
> |avgContainerDbWriteLatency|Avg Rocks DB read latency during getBlock|
> |avgCommitChunkLatency|Avg time taken to rename a chunk file|
> |numNotifyNoLeaderCallCount|This indicates how many times stateMachine has 
> been notified of no leader for extended period of time|
> |numNotifySlownessCallCount|Number of times a follower is notified to be slow 
> to the stateMachine|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1866) Enable purging of raft logs in ContainerStateMachine

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1866:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Enable purging of raft logs in ContainerStateMachine
> 
>
> Key: HDDS-1866
> URL: https://issues.apache.org/jira/browse/HDDS-1866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: Triaged
>
> The current purge gap for the raft logs is set to 1billion for 
> ContainerStateMachine, this should be set to 100,000 or a similar value to 
> reenable purging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1980) Datanode should sync the db for open containers in the pipeline during stateMachine close

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1980:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Datanode should sync the db for open containers in the pipeline during 
> stateMachine close
> -
>
> Key: HDDS-1980
> URL: https://issues.apache.org/jira/browse/HDDS-1980
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: TriagePending
>
> Datanode should sync the db for open containers in the pipeline during 
> stateMachine close to ensure metadata for these containers is persisted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1480) Ip address should not be a part of the DatanodeID since it can change

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1480:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Ip address should not be a part of the DatanodeID since it can change
> -
>
> Key: HDDS-1480
> URL: https://issues.apache.org/jira/browse/HDDS-1480
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: Triaged, newbie
>
> The DatanodeID identified by the DatanodeDetails object is persisted to disk 
> and read back on restart. The following fields are currently being serialized 
> and we should omit ip address from this set.
> {quote}
> UUID uuid;
> String ipAddress;
> String hostName;
> List ports;
> String certSerialId;
> {quote}
> cc: [~arpaga] this is follow-up from HDDS-1473



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1537) TestContainerPersistence#testDeleteBlockTwice is failing

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1537:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> TestContainerPersistence#testDeleteBlockTwice is failing 
> -
>
> Key: HDDS-1537
> URL: https://issues.apache.org/jira/browse/HDDS-1537
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
>
> The test is failing with the following exception.
> {code}
> [ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 4.132 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence
> [ERROR] 
> testDeleteBlockTwice(org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence)
>   Time elapsed: 0.058 s  <<< FAILURE!
> java.lang.AssertionError: Expected test to throw (an instance of 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException 
> and exception with message a string containing "Unable to find the block.")
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.junit.rules.ExpectedException.failDueToMissingException(ExpectedException.java:184)
>   at 
> org.junit.rules.ExpectedException.access$100(ExpectedException.java:85)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:170)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1844) Tune the stateMachineDataCache to a reasonable fraction of Datanode Heap

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1844:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Tune the stateMachineDataCache to a reasonable fraction of Datanode Heap
> 
>
> Key: HDDS-1844
> URL: https://issues.apache.org/jira/browse/HDDS-1844
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: Triaged
>
> Currently, the stateMachineData which is the actual chunk data is maintained 
> in the stateMachineCache inside ContainerStateMachine. Right now, the cache 
> expiry is time based as well sized as per the no of parallel write chunks 
> possible in the datanode. In case of optimal throughput, we may need to tune 
> to it to a fraction of heap configured for the datanode process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1408) TestMiniOzoneCluster.testDNstartAfterSCM fails with OverlappingFileLockException

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1408.
---
Resolution: Implemented

> TestMiniOzoneCluster.testDNstartAfterSCM fails with 
> OverlappingFileLockException
> 
>
> Key: HDDS-1408
> URL: https://issues.apache.org/jira/browse/HDDS-1408
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: TriagePending
>
> TestMiniOzoneCluster.testDNstartAfterSCM fails with 
> OverlappingFileLockException. Detailed logs can be found here.
> https://ci.anzix.net/job/ozone-nightly/59/testReport/org.apache.hadoop.ozone/TestMiniOzoneCluster/testDNstartAfterSCM/
> {code}
> java.nio.channels.OverlappingFileLockException
>   at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>   at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>   at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1113)
>   at java.nio.channels.FileChannel.tryLock(FileChannel.java:1155)
>   at 
> org.apache.ratis.server.storage.RaftStorageDirectory.tryLock(RaftStorageDirectory.java:322)
>   at 
> org.apache.ratis.server.storage.RaftStorageDirectory.lock(RaftStorageDirectory.java:291)
>   at 
> org.apache.ratis.server.storage.RaftStorageDirectory.analyzeStorage(RaftStorageDirectory.java:264)
>   at 
> org.apache.ratis.server.storage.RaftStorage.analyzeAndRecoverStorage(RaftStorage.java:91)
>   at 
> org.apache.ratis.server.storage.RaftStorage.(RaftStorage.java:59)
>   at org.apache.ratis.server.impl.ServerState.(ServerState.java:106)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:101)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1251) all chunks are not deleted by block deletion even when all keys are deleted and all containers are closed

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-1251.
---
Target Version/s: 0.7.0  (was: 0.6.0)
  Resolution: Not A Problem

It will be addressed with garbage detection and collection mechanism.

> all chunks are not deleted by block deletion even when all keys are deleted 
> and all containers are closed
> -
>
> Key: HDDS-1251
> URL: https://issues.apache.org/jira/browse/HDDS-1251
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: Triaged
>
> steps taken :
> ---
>  # created 40 nodes cluster, wrote data on all datanodes.
>  # deleted all keys from the cluster and all containers are closed.
> block deletion triggered and deleted most of the chunks from all datanodes.
> But , it could not delete all chunks even after several days.
>  
> expectations : 
> all chunks should be deleted if there is no key present in the cluster and 
> all containers are closed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1143) Ensure stateMachineData to be evicted only after writeStateMachineData completes in ContainerStateMachine cache

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1143:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Ensure stateMachineData to be evicted only after writeStateMachineData 
> completes in ContainerStateMachine cache
> ---
>
> Key: HDDS-1143
> URL: https://issues.apache.org/jira/browse/HDDS-1143
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: TriagePending
>
> Currently, when we write StateMachineData, we first write to cache followed 
> by write to disk. The entry in the cache can get evicted while the actual 
> write is happening in case write is very slow. The purpose of this Jira is to 
> ensure the cache eviction only after writeChunk completes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-904) RATIS group not found thrown on datanodes while leader election

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-904:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> RATIS group not found thrown on datanodes while leader election
> ---
>
> Key: HDDS-904
> URL: https://issues.apache.org/jira/browse/HDDS-904
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Reporter: Nilotpal Nandi
>Priority: Major
>  Labels: TriagePending
> Attachments: datanode_1.log, datanode_2.log, datanode_3.log, scm.log
>
>
> Following exception seen in datanode.log of one the docker nodes
> -
> {noformat}
> 2018-12-06 09:32:11 INFO LeaderElection:127 - 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500: begin an election in Term 1
> 2018-12-06 09:32:12 INFO LeaderElection:46 - 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500: Election TIMEOUT; received 0 
> response(s) [] and 0 exception(s); 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500:t1, 
> leader=null, voted=0e3aa95d-ab51-4b20-9bff-3f7bd7df0500, 
> raftlog=0e3aa95d-ab51-4b20-9bff-3f7bd7df0500-SegmentedRaftLog:OPENED, 
> conf=-1: [76153aab-4681-40b6-bc32-cc9ed5ef1daf:192.168.0.7:9858, 
> 79ca7251-7514-4c53-968c-ade59d6df07b:192.168.0.6:9858, 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500:192.168.0.4:9858], old=null
> 2018-12-06 09:32:13 INFO LeaderElection:127 - 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500: begin an election in Term 2
> 2018-12-06 09:32:13 INFO LeaderElection:230 - 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500 got exception when requesting votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> 76153aab-4681-40b6-bc32-cc9ed5ef1daf: group-41B8C34A6DE4 not found.
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
>  at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
>  at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> INTERNAL: 76153aab-4681-40b6-bc32-cc9ed5ef1daf: group-41B8C34A6DE4 not found.
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:222)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:203)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:132)
>  at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
>  at 
> org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:63)
>  at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:150)
>  at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2018-12-06 09:32:14 INFO LeaderElection:46 - 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500: Election TIMEOUT; received 0 
> response(s) [] and 1 exception(s); 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500:t2, 
> leader=null, voted=0e3aa95d-ab51-4b20-9bff-3f7bd7df0500, 
> raftlog=0e3aa95d-ab51-4b20-9bff-3f7bd7df0500-SegmentedRaftLog:OPENED, 
> conf=-1: [76153aab-4681-40b6-bc32-cc9ed5ef1daf:192.168.0.7:9858, 
> 79ca7251-7514-4c53-968c-ade59d6df07b:192.168.0.6:9858, 
> 0e3aa95d-ab51-4b20-9bff-3f7bd7df0500:192.168.0.4:9858], old=null{noformat}
>  
> cc - [~ljain]
> all logs attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1102) Confusing error log when datanode tries to connect to a destroyed pipeline

2020-06-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-1102:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Confusing error log when datanode tries to connect to a destroyed pipeline
> --
>
> Key: HDDS-1102
> URL: https://issues.apache.org/jira/browse/HDDS-1102
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Critical
>  Labels: TriagePending, newbie, pushed-to-craterlake, 
> test-badlands
> Attachments: allnode.log, datanode.log
>
>
> steps taken:
> 
>  # created 5 datanode cluster.
>  # shutdown 2 datanodes
>  # started the datanodes again.
> One of the datanodes was shut down.
> exception seen :
>  
> {noformat}
> 2019-02-14 07:37:26 INFO LeaderElection:230 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 got exception when requesting votes: {}
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at 
> org.apache.ratis.server.impl.LeaderElection.waitForResults(LeaderElection.java:214)
>  at 
> org.apache.ratis.server.impl.LeaderElection.askForVotes(LeaderElection.java:146)
>  at org.apache.ratis.server.impl.LeaderElection.run(LeaderElection.java:102)
> Caused by: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: 
> INTERNAL: a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)
>  at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$RaftServerProtocolServiceBlockingStub.requestVote(RaftServerProtocolServiceGrpc.java:265)
>  at 
> org.apache.ratis.grpc.server.GrpcServerProtocolClient.requestVote(GrpcServerProtocolClient.java:83)
>  at org.apache.ratis.grpc.server.GrpcService.requestVote(GrpcService.java:187)
>  at 
> org.apache.ratis.server.impl.LeaderElection.lambda$submitRequests$0(LeaderElection.java:188)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-14 07:37:26 INFO LeaderElection:46 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: Election PASSED; received 1 response(s) 
> [6a0522ba-019e-4b77-ac1f-a9322cd525b8<-61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5#0:OK-t7]
>  and 1 exception(s); 6a0522ba-019e-4b77-ac1f-a9322cd525b8:t7, leader=null, 
> voted=6a0522ba-019e-4b77-ac1f-a9322cd525b8, 
> raftlog=6a0522ba-019e-4b77-ac1f-a9322cd525b8-SegmentedRaftLog:OPENED, conf=3: 
> [61ad3bf3-e9b1-48e5-90e3-3b78c8b5bba5:172.20.0.8:9858, 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8:172.20.0.6:9858, 
> 0f377918-aafa-4d8a-972a-6ead54048fba:172.20.0.3:9858], old=null
> 2019-02-14 07:37:26 INFO LeaderElection:52 - 0: 
> java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> a3d1dd2d-554e-4e87-a2cf-076a229af352: group-FD6FA533F1FB not found.
> 2019-02-14 07:37:26 INFO RoleInfo:130 - 6a0522ba-019e-4b77-ac1f-a9322cd525b8: 
> shutdown LeaderElection
> 2019-02-14 07:37:26 INFO RaftServerImpl:161 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 changes role from CANDIDATE to LEADER at 
> term 7 for changeToLeader
> 2019-02-14 07:37:26 INFO RaftServerImpl:258 - 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8: change Leader from null to 
> 6a0522ba-019e-4b77-ac1f-a9322cd525b8 at term 7 for becomeLeader, leader 
> elected after 1066ms
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.staging.catchup.gap = 1000 (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.rpc.sleep.time 
> = 25ms (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - raft.server.watch.timeout 
> = 10s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.watch.timeout.denomination = 1s (default)
> 2019-02-14 07:37:26 INFO RaftServerConfigKeys:43 - 
> raft.server.log.appender.snapshot.chunk.size.max = 16MB (=16777216) (default)
> 

[GitHub] [hadoop-ozone] vivekratnavel commented on pull request #1047: HDDS-3726. Upload code coverage data to Codecov and enable checks in …

2020-06-09 Thread GitBox


vivekratnavel commented on pull request #1047:
URL: https://github.com/apache/hadoop-ozone/pull/1047#issuecomment-641723229


   @elek @adoroszlai We will start seeing diffs in code coverage report once 
this patch gets merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3481) SCM ask too many datanodes to replicate the same container

2020-06-09 Thread Xiaoyu Yao (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130067#comment-17130067
 ] 

Xiaoyu Yao commented on HDDS-3481:
--

bq. The reason is source datanode replicates too many containers to other 
datanode, so the source datanode become very slow.

Should we consider balancing the replication source among datanodes or throttle 
the replication per datanode?  

> SCM ask too many datanodes to replicate the same container
> --
>
> Key: HDDS-3481
> URL: https://issues.apache.org/jira/browse/HDDS-3481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Blocker
>  Labels: TriagePending, pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> *What's the problem ?*
> As the image shows,  scm ask 31 datanodes to replicate container 2037 every 
> 10 minutes from 2020-04-17 23:38:51.  And at 2020-04-18 08:58:52 scm find the 
> replicate num of container 2037 is 12, then it ask 11 datanodes to delete 
> container 2037. 
>  !screenshot-1.png! 
>  !screenshot-2.png! 
> *What's the reason ?*
> scm check whether  (container replicates num + 
> inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it 
> will ask some datanode to replicate the container, and add the action into 
> inflightReplication.get(containerId). The replicate action time out is 10 
> minutes, if action timeout, scm will delete the action from 
> inflightReplication.get(containerId) as the image shows. Then (container 
> replicates num + inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask 
> another datanode to replicate the container.
> Because replicate container cost a long time,  sometimes it cannot finish in 
> 10 minutes, thus 31 datanodes has to replicate the container every 10 
> minutes.  19 of 31 datanodes replicate container from the same source 
> datanode,  it will also cause big pressure on the source datanode and 
> replicate container become slower. Actually it cost 4 hours to finish the 
> first replicate. 
>  !screenshot-4.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1002: HDDS-3642. Stop/Pause Background services while replacing OM DB with checkpoint from Leader

2020-06-09 Thread GitBox


bharatviswa504 commented on pull request #1002:
URL: https://github.com/apache/hadoop-ozone/pull/1002#issuecomment-641719335


   Looks like it is failing in compilation and also check style issues.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1002: HDDS-3642. Stop/Pause Background services while replacing OM DB with checkpoint from Leader

2020-06-09 Thread GitBox


bharatviswa504 commented on a change in pull request #1002:
URL: https://github.com/apache/hadoop-ozone/pull/1002#discussion_r437858630



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -3113,27 +3120,32 @@ private DBCheckpoint getDBCheckpointFromLeader(String 
leaderId) {
 return null;
   }
 
+  void stopServices() throws Exception {
+keyManager.stop();
+stopSecretManager();
+metadataManager.stop();
+
+// s3SecretManager should also be stopped

Review comment:
   Thanks for confirming.
   Then we can remove the comments.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-741) all containers are in 'CLOSING' state after service restart

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-741:
-
Labels: TriagePending  (was: TriagePending test-badlands)

> all containers are in 'CLOSING' state after service restart
> ---
>
> Key: HDDS-741
> URL: https://issues.apache.org/jira/browse/HDDS-741
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Blocker
>  Labels: TriagePending
> Attachments: all-node-ozone-logs-1540556458.tar.gz
>
>
> all containers are in closing state after service restart. None of the writes 
> are working after restart.
> The cluster contains 11 live datanodes.
> **
> {noformat}
> [
>  {
>  "nodeType": "OM",
>  "hostname": "ctr-e138-1518143905142-53-01-08.hwx.site",
>  "ports": {
>  "RPC": 9889,
>  "HTTP": 9874
>  }
>  },
>  {
>  "nodeType": "SCM",
>  "hostname": "ctr-e138-1518143905142-53-01-03.hwx.site",
>  "ports": {
>  "RPC": 9860
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-541661-01-03.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-541661-01-07.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-53-01-03.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-541661-01-04.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-53-01-04.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-53-01-08.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-541661-01-02.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-53-01-05.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-541661-01-06.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-53-01-07.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  },
>  {
>  "nodeType": "DATANODE",
>  "hostname": "ctr-e138-1518143905142-53-01-06.hwx.site",
>  "ports": {
>  "HTTP": 9880
>  }
>  }
> ]{noformat}
> error thrown while write :
>  
> {noformat}
> [root@ctr-e138-1518143905142-541661-01-07 test_files]# ozone fs -put 
> /etc/passwd /testdir5/
> 2018-10-26 12:09:43,822 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2018-10-26 12:09:47,882 ERROR io.ChunkGroupOutputStream: Try to allocate more 
> blocks for write failed, already allocated 0 blocks for this write.
> put: Allocate block failed, error:INTERNAL_ERROR{noformat}
>  
>  
> pipelines in the cluster :
>  
> {noformat}
> [root@ctr-e138-1518143905142-541661-01-07 test_files]# ozone scmcli 
> listPipelines
> Pipeline[ Id: 29b68cc2-2d18-4db0-a11a-587ae4abc715, Nodes: 
> e3d89961-fe38-4ed0-8a32-cd1849c58e0c{ip: 172.27.20.96, host: 
> ctr-e138-1518143905142-53-01-08.hwx.site}b33a30d9-f1e2-448e-aabb-61a970445cea{ip:
>  172.27.85.64, host: ctr-e138-1518143905142-541661-01-07.hwx.site}, 
> Type:RATIS, Factor:THREE, State:CLOSING]
> Pipeline[ Id: 05061f87-4c68-443b-ae27-984da2d0a2cd, Nodes: 
> dc002a73-fc63-4e76-be3e-3c6d16ede5f6{ip: 172.27.38.9, host: 
> ctr-e138-1518143905142-53-01-04.hwx.site}4e6bd2a2-6802-4e67-9710-612a2cdb9dc1{ip:
>  172.27.24.90, host: 
> ctr-e138-1518143905142-53-01-05.hwx.site}be3f0db4-3a19-44a5-bd6e-0da47d2ed92e{ip:
>  172.27.20.91, host: ctr-e138-1518143905142-53-01-03.hwx.site}, 
> Type:RATIS, Factor:THREE, State:CLOSING]
> Pipeline[ Id: 80893f87-5e73-49a2-8f38-2adb2b13140a, Nodes: 
> 63833540-bf93-410c-b081-243a56f93c88{ip: 172.27.10.199, host: 
> ctr-e138-1518143905142-53-01-07.hwx.site}6e8b7129-8615-45fe-81e0-848a2e0ba520{ip:
>  172.27.15.139, host: 
> ctr-e138-1518143905142-53-01-06.hwx.site}aab1f2e5-1cf0-430d-b1bf-04be8630a8ee{ip:
>  172.27.57.0, host: ctr-e138-1518143905142-541661-01-03.hwx.site}, 
> Type:RATIS, Factor:THREE, State:CLOSING]
> Pipeline[ Id: f0a14cb9-d37a-4c7c-b3e6-b7e3830dfd5f, Nodes: 
> 61e271bf-68ad-435e-8a6e-582be90ebb6f{ip: 172.27.19.74, host: 
> ctr-e138-1518143905142-541661-01-06.hwx.site}3622352c-b136-4c74-b952-34e938cbda94{ip:
>  172.27.15.131, host: 
> 

[jira] [Updated] (HDDS-1169) Add MetricSource Support to SCM metrics

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1169:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Add MetricSource Support to SCM metrics
> ---
>
> Key: HDDS-1169
> URL: https://issues.apache.org/jira/browse/HDDS-1169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Anu Engineer
>Assignee: Nanda kumar
>Priority: Major
>  Labels: Triaged
>
> In HDDS-1070, [~bharatviswa] pointed out that we need to support 
> MetricSources for SCM metrics. 
> cc: [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1131) destroy pipeline failed with PipelineNotFoundException

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1131:
--
Labels: TriagePending  (was: TriagePending pushed-to-craterlake 
test-badlands)

> destroy pipeline failed with PipelineNotFoundException
> --
>
> Key: HDDS-1131
> URL: https://issues.apache.org/jira/browse/HDDS-1131
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Major
>  Labels: TriagePending
>
> steps taken :
> 
>  # created 12 datanodes cluster and running workload on all the nodes
> exceptions seen in scm log
> 
> {noformat}
> 2019-02-18 07:17:51,112 INFO 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: destroying 
> pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
> group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
> 2019-02-18 07:17:51,112 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #40
> 2019-02-18 07:17:51,113 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #41
> 2019-02-18 07:17:51,114 INFO 
> org.apache.hadoop.hdds.scm.container.CloseContainerEventHandler: Close 
> container Event triggered for container : #42
> 2019-02-18 07:22:51,127 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=a40a7b01-a30b-469c-b373-9fcb20a126ed{ip: 172.27.54.212, host: 
> ctr-e139-1542663976389-62237-01-07.hwx.site}
> 2019-02-18 07:22:51,139 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=8c77b16b-8054-49e3-b669-1ff759cfd271{ip: 172.27.23.196, host: 
> ctr-e139-1542663976389-62237-01-15.hwx.site}
> 2019-02-18 07:22:51,149 WARN 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Pipeline destroy 
> failed for pipeline=PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb 
> dn=943007c8-4fdd-4926-89e2-2c8c52c05073{ip: 172.27.76.72, host: 
> ctr-e139-1542663976389-62237-01-06.hwx.site}
> 2019-02-18 07:22:51,150 ERROR 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils: Destroy pipeline 
> failed for pipeline:PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb with 
> group-012343D76ADB:[a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 
> 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858, 
> 943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858]
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=01d3ef2a-912c-4fc0-80b6-012343d76adb not found
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:112)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.removePipeline(PipelineStateMap.java:247)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.removePipeline(PipelineStateManager.java:90)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(SCMPipelineManager.java:261)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.destroyPipeline(RatisPipelineUtils.java:103)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineUtils.lambda$finalizeAndDestroyPipeline$1(RatisPipelineUtils.java:133)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>  at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>  at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>  at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (HDDS-1169) Add MetricSource Support to SCM metrics

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1169:
--
Labels:   (was: Triaged)

> Add MetricSource Support to SCM metrics
> ---
>
> Key: HDDS-1169
> URL: https://issues.apache.org/jira/browse/HDDS-1169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Anu Engineer
>Assignee: Nanda kumar
>Priority: Major
>
> In HDDS-1070, [~bharatviswa] pointed out that we need to support 
> MetricSources for SCM metrics. 
> cc: [~nandakumar131]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1556) SCM should update leader information of a pipeline from pipeline report

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar reassigned HDDS-1556:
-

Assignee: Nanda kumar

> SCM should update leader information of a pipeline from pipeline report
> ---
>
> Key: HDDS-1556
> URL: https://issues.apache.org/jira/browse/HDDS-1556
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Nanda kumar
>Priority: Major
>  Labels: TriagePending
>
> SCM currently hands out nodes in a static order which has no information 
> about the leader of the ring.  SCM should learn about the leader in either of 
> the following 2 ways.
> a) Ratis should add a callback to statemachine, which inturn notifies SCM of 
> the state change.
> b) SCM periodically updates this information from the heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2641) Allow SCM webUI to show decommision and maintenance nodes

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2641:
--
Labels:   (was: TriagePending)

> Allow SCM webUI to show decommision and maintenance nodes
> -
>
> Key: HDDS-2641
> URL: https://issues.apache.org/jira/browse/HDDS-2641
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> The SCM WebUI should show the current set of decommission and maintenance 
> nodes, possibly including the number of containers each node is waiting to 
> have replicated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2641) Allow SCM webUI to show decommision and maintenance nodes

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2641:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Allow SCM webUI to show decommision and maintenance nodes
> -
>
> Key: HDDS-2641
> URL: https://issues.apache.org/jira/browse/HDDS-2641
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: TriagePending
>
> The SCM WebUI should show the current set of decommission and maintenance 
> nodes, possibly including the number of containers each node is waiting to 
> have replicated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2673) Merge MockNodeManager and SimpleMockNodeManager

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2673:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Merge MockNodeManager and SimpleMockNodeManager
> ---
>
> Key: HDDS-2673
> URL: https://issues.apache.org/jira/browse/HDDS-2673
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> MockNodeManager does not fully support the Decommission and Maintenance 
> states currently. To allow the decommission work to progress, 
> SimpleMockNodeManager was created, implementing just enough functionality for 
> decommission / maintenance related tests.
> We should consider making any necessary refactors to MockNodeManager and 
> merge / remove SimpleMockNodeManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2673) Merge MockNodeManager and SimpleMockNodeManager

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2673:
--
Labels:   (was: TriagePending)

> Merge MockNodeManager and SimpleMockNodeManager
> ---
>
> Key: HDDS-2673
> URL: https://issues.apache.org/jira/browse/HDDS-2673
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> MockNodeManager does not fully support the Decommission and Maintenance 
> states currently. To allow the decommission work to progress, 
> SimpleMockNodeManager was created, implementing just enough functionality for 
> decommission / maintenance related tests.
> We should consider making any necessary refactors to MockNodeManager and 
> merge / remove SimpleMockNodeManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2642) Expose decommission / maintenance metrics via JMX

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2642:
--
Labels:   (was: TriagePending)

> Expose decommission / maintenance metrics via JMX
> -
>
> Key: HDDS-2642
> URL: https://issues.apache.org/jira/browse/HDDS-2642
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Priority: Major
>
> As nodes transition through the decommission and maintenance workflow, we 
> should expose the hosts going through admin via JMX, along with possibly:
> 1. The stage of the process (close pipelines, replicate containers etc)
> 2. The number of sufficiently replicated, under replicated and unhealthy 
> containers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2642) Expose decommission / maintenance metrics via JMX

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-2642:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Expose decommission / maintenance metrics via JMX
> -
>
> Key: HDDS-2642
> URL: https://issues.apache.org/jira/browse/HDDS-2642
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Priority: Major
>  Labels: TriagePending
>
> As nodes transition through the decommission and maintenance workflow, we 
> should expose the hosts going through admin via JMX, along with possibly:
> 1. The stage of the process (close pipelines, replicate containers etc)
> 2. The number of sufficiently replicated, under replicated and unhealthy 
> containers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3080) Configurable container placement policy may not be compatible with pipeline placement policy

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-3080:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Configurable container placement policy may not be compatible with pipeline 
> placement policy
> 
>
> Key: HDDS-3080
> URL: https://issues.apache.org/jira/browse/HDDS-3080
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: Stephen O'Donnell
>Priority: Major
>
> New pipelines are created on hosts in a rack aware way using 
> PipelinePlacementPolicy, which supports only rack aware and random.
> However the Container Placement Policy used by ReplicationManager can be 
> configured between the following using the config option 
> ozone.scm.container.placement.impl:
> SCMContainerPlacementRackAware
> SCMContainerPlacementCapacity
> SCMContainerPlacementRandom
> If there is a network topology configured via net.topology.table.file.name, 
> then it would be possible to have an incompatible setting for 
> ozone.scm.container.placement.impl.
> We need to consider how to ensure the replication manager and Pipeline 
> Provider always use compatible policies. One option may be to merge the rack 
> aware policies as suggested in HDDS-3079, and also allow Capacity and Random 
> to be used in the pipeline provider.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3107) Pipelines may not be rack aware on cluster startup

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-3107:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Pipelines may not be rack aware on cluster startup
> --
>
> Key: HDDS-3107
> URL: https://issues.apache.org/jira/browse/HDDS-3107
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: Stephen O'Donnell
>Priority: Major
> Attachments: docker-ozone-topology-ozone-topology-readdata-scm.log
>
>
> Given a 6 node cluster with 2 racks so there are 3 nodes per rack, it is 
> possible for the pipeline to be created in a non-rack-aware way on startup.
> Using a robot test, like the one in HDDS-3084 intermittently I can see that 
> if all nodes from one rack get registered first, a pipeline creation is 
> triggered on them resulting in a pipeline which is all on one rack. Then the 
> next 3 nodes register and as there are no nodes available on the other rack, 
> they too join a "one rack" pipeline.
> This log snippet shows this happening. I will attach the full docker-compose 
> log:
> {code}
> egrep "Sending CreatePipelineCommand|Registered Data node|Created pipe" 
> docker-ozone-topology-ozone-topology-readdata-scm.log
> scm_1 | 2020-02-28 12:27:57,826 [IPC Server handler 6 on 9861] INFO 
> node.SCMNodeManager: Registered Data node : 
> 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: 
> ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, 
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:57,840 [IPC Server handler 9 on 9861] INFO 
> node.SCMNodeManager: Registered Data node : 
> 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: 
> ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, 
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:57,903 [RatisPipelineUtilsThread] INFO 
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for 
> pipeline:PipelineID=16806a56-8e35-46b2-aefd-cb5232d6f5f7 to 
> datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1 | 2020-02-28 12:27:57,924 [RatisPipelineUtilsThread] INFO 
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 
> 16806a56-8e35-46b2-aefd-cb5232d6f5f7, Nodes: 
> 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: 
> ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, 
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, 
> CreationTimestamp2020-02-28T12:27:57.891553Z]
> scm_1 | 2020-02-28 12:27:57,932 [RatisPipelineUtilsThread] INFO 
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for 
> pipeline:PipelineID=5a3edf1e-84f6-48ef-a333-6f3e924898a6 to 
> datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1 | 2020-02-28 12:27:57,933 [RatisPipelineUtilsThread] INFO 
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 
> 5a3edf1e-84f6-48ef-a333-6f3e924898a6, Nodes: 
> 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: 
> ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, 
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, 
> CreationTimestamp2020-02-28T12:27:57.932422Z]
> scm_1 | 2020-02-28 12:27:58,213 [IPC Server handler 8 on 9861] INFO 
> node.SCMNodeManager: Registered Data node : 
> 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: 
> ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, 
> certSerialId: null}
> scm_1 | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO 
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for 
> pipeline:PipelineID=ba2034fc-cb11-482a-9843-435294862240 to 
> datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1 | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO 
> pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 
> ba2034fc-cb11-482a-9843-435294862240, Nodes: 
> 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: 
> ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, 
> certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, 
> CreationTimestamp2020-02-28T12:27:58.216275Z]
> scm_1 | 2020-02-28 12:27:58,218 [RatisPipelineUtilsThread] INFO 
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for 
> pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to 
> datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1 | 2020-02-28 12:27:58,219 [RatisPipelineUtilsThread] INFO 
> pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for 
> pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to 
> datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1 | 2020-02-28 12:27:58,220 

[jira] [Updated] (HDDS-3345) Investigate failure of TestDecommissionAndMaintenance integration test

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-3345:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Investigate failure of TestDecommissionAndMaintenance integration test
> --
>
> Key: HDDS-3345
> URL: https://issues.apache.org/jira/browse/HDDS-3345
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> The test in the above class is failing and needs corrected. It is currently 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1880) Decommissioining and maintenance mode in Ozone

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-1880:
--
Labels:   (was: TriagePending)

> Decommissioining and maintenance mode in Ozone 
> ---
>
> Key: HDDS-1880
> URL: https://issues.apache.org/jira/browse/HDDS-1880
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: SCM
>Reporter: Marton Elek
>Assignee: Stephen O'Donnell
>Priority: Major
>
> This is the umbrella jira for decommissioning support in Ozone. Design doc 
> will be attached soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3485) Use typesafe ContainerID instead of long value

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-3485:
--
Labels:   (was: TriagePending)

> Use typesafe ContainerID instead of long value
> --
>
> Key: HDDS-3485
> URL: https://issues.apache.org/jira/browse/HDDS-3485
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>
> We should only use typesafe {{ContainerID}} reference in all the places 
> inside SCM and avoid using long value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3485) Use typesafe ContainerID instead of long value

2020-06-09 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-3485:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> Use typesafe ContainerID instead of long value
> --
>
> Key: HDDS-3485
> URL: https://issues.apache.org/jira/browse/HDDS-3485
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: TriagePending
>
> We should only use typesafe {{ContainerID}} reference in all the places 
> inside SCM and avoid using long value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1041: HDDS-3725. Ozone sh volume client support quota option

2020-06-09 Thread GitBox


GlenGeng commented on a change in pull request #1041:
URL: https://github.com/apache/hadoop-ozone/pull/1041#discussion_r437846415



##
File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/VolumeArgs.java
##
@@ -107,7 +118,8 @@ public String getQuota() {
   public static class Builder {
 private String adminName;
 private String ownerName;
-private String volumeQuota;
+private String volumeSsQuota;
+private long volumeNsQuota;

Review comment:
   `volumeSsQuota` and `volumeNsQuota`
   ditto, consider a better naming, it will be easier to for others to use and 
maintain.

##
File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/OzoneQuota.java
##
@@ -25,26 +25,73 @@
  * represents an OzoneQuota Object that can be applied to
  * a storage volume.
  */
-public class OzoneQuota {
+public final class OzoneQuota {
 
   public static final String OZONE_QUOTA_BYTES = "BYTES";
+  public static final String OZONE_QUOTA_KB = "KB";
   public static final String OZONE_QUOTA_MB = "MB";
   public static final String OZONE_QUOTA_GB = "GB";
   public static final String OZONE_QUOTA_TB = "TB";
 
-  private Units unit;
-  private long size;
-
   /** Quota Units.*/
   public enum Units {UNDEFINED, BYTES, KB, MB, GB, TB}
 
+  private long namespaceQuota;

Review comment:
   add comments to explain the meaning of namespaceQuota and 
storagespaceQuota.

##
File path: hadoop-hdds/docs/content/shell/VolumeCommands.md
##
@@ -42,7 +42,7 @@ assign it to a user.
 |  Uri   | The name of the volume. 
   |
 
 {{< highlight bash >}}
-ozone sh volume create --quota=1TB --user=bilbo /hive
+ozone sh volume create -ssq=1TB --user=bilbo /hive

Review comment:
   `-ssq `is hard to understand, thus hard to use. Can we give it a self 
explain name ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2564) Handle InterruptedException in ContainerStateMachine

2020-06-09 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved HDDS-2564.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

> Handle InterruptedException in ContainerStateMachine
> 
>
> Key: HDDS-2564
> URL: https://issues.apache.org/jira/browse/HDDS-2564
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Dinesh Chitlangia
>Assignee: Dinesh Chitlangia
>Priority: Major
>  Labels: Triaged, newbie, pull-request-available, sonar
> Fix For: 0.6.0
>
>
> https://sonarcloud.io/project/issues?id=hadoop-ozone=AW5md-65KcVY8lQ4ZsRV=AW5md-65KcVY8lQ4ZsRV
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3771) Block when using ’ozone fs -cat o3fs://xxxxx.xxxx/xxx‘

2020-06-09 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3771:

Description: 
Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen in 
the background. This is normal when I use ’ozone sh key cat /x//xxx ‘.

!image-2020-06-10-11-48-12-299.png|width=919,height=88!

 

  was:
Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen in 
the background. This is normal when I use ’ozone sh key cat /x//xxx ‘.

 


> Block when using ’ozone fs -cat o3fs://x./xxx‘
> --
>
> Key: HDDS-3771
> URL: https://issues.apache.org/jira/browse/HDDS-3771
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.6.0
>Reporter: mingchao zhao
>Priority: Major
> Attachments: image-2020-06-10-11-48-12-299.png
>
>
> Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen 
> in the background. This is normal when I use ’ozone sh key cat 
> /x//xxx ‘.
> !image-2020-06-10-11-48-12-299.png|width=919,height=88!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3771) Block when using ’ozone fs -cat o3fs://xxxxx.xxxx/xxx‘

2020-06-09 Thread mingchao zhao (Jira)
mingchao zhao created HDDS-3771:
---

 Summary: Block when using ’ozone fs -cat o3fs://x./xxx‘
 Key: HDDS-3771
 URL: https://issues.apache.org/jira/browse/HDDS-3771
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Filesystem
Affects Versions: 0.6.0
Reporter: mingchao zhao
 Attachments: image-2020-06-10-11-48-12-299.png

Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen in 
the background. This is normal when I use ’ozone sh key cat /x//xxx ‘.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3771) Block when using ’ozone fs -cat o3fs://xxxxx.xxxx/xxx‘

2020-06-09 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3771:

Attachment: image-2020-06-10-11-48-12-299.png

> Block when using ’ozone fs -cat o3fs://x./xxx‘
> --
>
> Key: HDDS-3771
> URL: https://issues.apache.org/jira/browse/HDDS-3771
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.6.0
>Reporter: mingchao zhao
>Priority: Major
> Attachments: image-2020-06-10-11-48-12-299.png
>
>
> Block when I use ’ozone fs -cat o3fs://x./xxx‘. And no logs are seen 
> in the background. This is normal when I use ’ozone sh key cat 
> /x//xxx ‘.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1048: HDDS-3481. SCM ask too many datanodes to replicate the same container

2020-06-09 Thread GitBox


runzhiwang commented on pull request #1048:
URL: https://github.com/apache/hadoop-ozone/pull/1048#issuecomment-641701943


   @nandakumar131 @arp7 Could you help review this patch ? Thank you very much.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on pull request #1043: HDDS-3760. Avoid UUID#toString call in Pipeline#getProtobufMessage

2020-06-09 Thread GitBox


xiaoyuyao commented on pull request #1043:
URL: https://github.com/apache/hadoop-ozone/pull/1043#issuecomment-641690016


   LGTM, +1. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3481) SCM ask too many datanodes to replicate the same container

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3481:
-
Labels: TriagePending pull-request-available  (was: TriagePending)

> SCM ask too many datanodes to replicate the same container
> --
>
> Key: HDDS-3481
> URL: https://issues.apache.org/jira/browse/HDDS-3481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Blocker
>  Labels: TriagePending, pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> *What's the problem ?*
> As the image shows,  scm ask 31 datanodes to replicate container 2037 every 
> 10 minutes from 2020-04-17 23:38:51.  And at 2020-04-18 08:58:52 scm find the 
> replicate num of container 2037 is 12, then it ask 11 datanodes to delete 
> container 2037. 
>  !screenshot-1.png! 
>  !screenshot-2.png! 
> *What's the reason ?*
> scm check whether  (container replicates num + 
> inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it 
> will ask some datanode to replicate the container, and add the action into 
> inflightReplication.get(containerId). The replicate action time out is 10 
> minutes, if action timeout, scm will delete the action from 
> inflightReplication.get(containerId) as the image shows. Then (container 
> replicates num + inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask 
> another datanode to replicate the container.
> Because replicate container cost a long time,  sometimes it cannot finish in 
> 10 minutes, thus 31 datanodes has to replicate the container every 10 
> minutes.  19 of 31 datanodes replicate container from the same source 
> datanode,  it will also cause big pressure on the source datanode and 
> replicate container become slower. Actually it cost 4 hours to finish the 
> first replicate. 
>  !screenshot-4.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] runzhiwang opened a new pull request #1048: HDDS-3481. SCM ask too many datanodes to replicate the same container

2020-06-09 Thread GitBox


runzhiwang opened a new pull request #1048:
URL: https://github.com/apache/hadoop-ozone/pull/1048


   ## What changes were proposed in this pull request?
   
   1.  increase replication timeout.
   2. if datanode is not healthy, remove the replicate action from 
inflightActions 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3481
   
   Please replace this section with the link to the Apache JIRA)
   
   ## How was this patch tested?
   
   Existed tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3481) SCM ask too many datanodes to replicate the same container

2020-06-09 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3481:
-
Summary: SCM ask too many datanodes to replicate the same container  (was: 
SCM ask 31 datanodes to replicate the same container)

> SCM ask too many datanodes to replicate the same container
> --
>
> Key: HDDS-3481
> URL: https://issues.apache.org/jira/browse/HDDS-3481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Blocker
>  Labels: TriagePending
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> *What's the problem ?*
> As the image shows,  scm ask 31 datanodes to replicate container 2037 every 
> 10 minutes from 2020-04-17 23:38:51.  And at 2020-04-18 08:58:52 scm find the 
> replicate num of container 2037 is 12, then it ask 11 datanodes to delete 
> container 2037. 
>  !screenshot-1.png! 
>  !screenshot-2.png! 
> *What's the reason ?*
> scm check whether  (container replicates num + 
> inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it 
> will ask some datanode to replicate the container, and add the action into 
> inflightReplication.get(containerId). The replicate action time out is 10 
> minutes, if action timeout, scm will delete the action from 
> inflightReplication.get(containerId) as the image shows. Then (container 
> replicates num + inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask 
> another datanode to replicate the container.
> Because replicate container cost a long time,  sometimes it cannot finish in 
> 10 minutes, thus 31 datanodes has to replicate the container every 10 
> minutes.  19 of 31 datanodes replicate container from the same source 
> datanode,  it will also cause big pressure on the source datanode and 
> replicate container become slower. Actually it cost 4 hours to finish the 
> first replicate. 
>  !screenshot-4.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1028: HDDS-3735. Improve SCM performance with 3.7% by remove unnecessary lock and unlock

2020-06-09 Thread GitBox


runzhiwang commented on pull request #1028:
URL: https://github.com/apache/hadoop-ozone/pull/1028#issuecomment-641678745


   @elek Could you help review this patch ? Thank you very much.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] hanishakoneru commented on pull request #1004: HDDS-3639. Maintain FileHandle Information in OMMetadataManager.

2020-06-09 Thread GitBox


hanishakoneru commented on pull request #1004:
URL: https://github.com/apache/hadoop-ozone/pull/1004#issuecomment-641677419


   Thanks for working on this @prashantpogde. 
   +1 for merging into branch HDDS-3001, pending CI.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] hanishakoneru commented on a change in pull request #1004: HDDS-3639. Maintain FileHandle Information in OMMetadataManager.

2020-06-09 Thread GitBox


hanishakoneru commented on a change in pull request #1004:
URL: https://github.com/apache/hadoop-ozone/pull/1004#discussion_r437820153



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMKeyRenameResponse.java
##
@@ -93,6 +93,15 @@ public void addToDBBatch(OMMetadataManager omMetadataManager,
   omMetadataManager.getKeyTable().putWithBatch(batchOperation,

Review comment:
   Thanks @prashantpogde for the offline discussion. 
   It will be lot of extra work updating the keyID in deleteFromKeyOnly case 
also as we don't send the toKey name in this case. It is safe to go ahead with 
this approach as Bharat's replay optimization - HDDS-3354 will take care of 
this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3770) Improve discardPipeline

2020-06-09 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3770:
-
Description:  !screenshot-1.png! 

> Improve discardPipeline
> ---
>
> Key: HDDS-3770
> URL: https://issues.apache.org/jira/browse/HDDS-3770
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3770) Improve discardPipeline

2020-06-09 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3770:
-
Attachment: screenshot-1.png

> Improve discardPipeline
> ---
>
> Key: HDDS-3770
> URL: https://issues.apache.org/jira/browse/HDDS-3770
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3770) Improve discardPipeline

2020-06-09 Thread runzhiwang (Jira)
runzhiwang created HDDS-3770:


 Summary: Improve discardPipeline
 Key: HDDS-3770
 URL: https://issues.apache.org/jira/browse/HDDS-3770
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

2020-06-09 Thread GitBox


runzhiwang commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-641665483


   @elek Could you help review this patch ? Thank you very much.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2873) hdds.x509.CRL.name missing from ozone-default.xml

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2873:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> hdds.x509.CRL.name missing from ozone-default.xml
> -
>
> Key: HDDS-2873
> URL: https://issues.apache.org/jira/browse/HDDS-2873
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Priority: Major
>  Labels: Triaged
>
> {{hdds.x509.CRL.name}} is reported by {{TestOzoneConfigurationFields}} to be 
> missing from {{ozone-default.xml}}. If it is to be documented, then please 
> add the property to {{ozone-default.xml}}. If it's a developer-only setting, 
> please add as exception in 
> {{TestOzoneConfigurationFields#addPropertiesNotInXml}}.
> (Sorry for reporting this post-commit. {{TestOzoneConfigurationFields}} will 
> be run by CI once we have integration tests enabled again.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2873) hdds.x509.CRL.name missing from ozone-default.xml

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2873:
-
Parent: HDDS-2731
Issue Type: Sub-task  (was: Bug)

> hdds.x509.CRL.name missing from ozone-default.xml
> -
>
> Key: HDDS-2873
> URL: https://issues.apache.org/jira/browse/HDDS-2873
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Priority: Major
>  Labels: Triaged
>
> {{hdds.x509.CRL.name}} is reported by {{TestOzoneConfigurationFields}} to be 
> missing from {{ozone-default.xml}}. If it is to be documented, then please 
> add the property to {{ozone-default.xml}}. If it's a developer-only setting, 
> please add as exception in 
> {{TestOzoneConfigurationFields#addPropertiesNotInXml}}.
> (Sorry for reporting this post-commit. {{TestOzoneConfigurationFields}} will 
> be run by CI once we have integration tests enabled again.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2867) Generic Extensible Token support for Ozone

2020-06-09 Thread Xiaoyu Yao (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129908#comment-17129908
 ] 

Xiaoyu Yao commented on HDDS-2867:
--

Move to 0.7.0

> Generic Extensible Token support for Ozone
> --
>
> Key: HDDS-2867
> URL: https://issues.apache.org/jira/browse/HDDS-2867
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Security
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: Triaged
> Attachments: Generic Extensible Tokens for Ozone.pdf
>
>
> This is the umbrella Jira to add generic token support across ozone 
> components. I will attach a design spec for review and comments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2867) Generic Extensible Token support for Ozone

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2867:
-
Labels: Triaged  (was: TriagePending)

> Generic Extensible Token support for Ozone
> --
>
> Key: HDDS-2867
> URL: https://issues.apache.org/jira/browse/HDDS-2867
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Security
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: Triaged
> Attachments: Generic Extensible Tokens for Ozone.pdf
>
>
> This is the umbrella Jira to add generic token support across ozone 
> components. I will attach a design spec for review and comments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2867) Generic Extensible Token support for Ozone

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2867:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> Generic Extensible Token support for Ozone
> --
>
> Key: HDDS-2867
> URL: https://issues.apache.org/jira/browse/HDDS-2867
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Security
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: TriagePending
> Attachments: Generic Extensible Tokens for Ozone.pdf
>
>
> This is the umbrella Jira to add generic token support across ozone 
> components. I will attach a design spec for review and comments. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2496) Delegate Ozone volume create/list ACL check to authorizer plugin

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-2496:
-
Labels: Triaged  (was: TriagePending)

> Delegate Ozone volume create/list ACL check to authorizer plugin
> 
>
> Key: HDDS-2496
> URL: https://issues.apache.org/jira/browse/HDDS-2496
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager, Security
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: Triaged
> Fix For: 0.5.0
>
>
> Today Ozone volume create/list ACL check are not sent to authorization 
> plugins. This cause problem when authorization plugin is enabled. Admin still 
> need to modify ozone-site.xml to change ozone.administrators to configure 
> admin to create volume
>  
> This ticket is opened to have a consistent ACL check for all Ozone resources 
> requests including admin request like volume create. This way, the admin 
> defined by the authorization plugin can be honored during volume provision 
> without restart ozone services. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2496) Delegate Ozone volume create/list ACL check to authorizer plugin

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao resolved HDDS-2496.
--
   Fix Version/s: 0.5.0
Target Version/s: 0.5.0  (was: 0.6.0)
  Resolution: Fixed

This has been fixed in HDDS-3391

> Delegate Ozone volume create/list ACL check to authorizer plugin
> 
>
> Key: HDDS-2496
> URL: https://issues.apache.org/jira/browse/HDDS-2496
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager, Security
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: TriagePending
> Fix For: 0.5.0
>
>
> Today Ozone volume create/list ACL check are not sent to authorization 
> plugins. This cause problem when authorization plugin is enabled. Admin still 
> need to modify ozone-site.xml to change ozone.administrators to configure 
> admin to create volume
>  
> This ticket is opened to have a consistent ACL check for all Ozone resources 
> requests including admin request like volume create. This way, the admin 
> defined by the authorization plugin can be honored during volume provision 
> without restart ozone services. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3769) hadoop-hdds interface-client fail to build with JDK11

2020-06-09 Thread Xiaoyu Yao (Jira)
Xiaoyu Yao created HDDS-3769:


 Summary: hadoop-hdds interface-client fail to build with JDK11
 Key: HDDS-3769
 URL: https://issues.apache.org/jira/browse/HDDS-3769
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Xiaoyu Yao


[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-hdds-interface-client: Compilation failure: Compilation failure: 
[ERROR] 
/Users/xyao/github/hadoop-ozone/hadoop-hdds/interface-client/target/generated-sources/java/org/apache/hadoop/hdds/protocol/datanode/proto/IntraDatanodeProtocolServiceGrpc.java:[20,18]
 package javax.annotation does not exist
[ERROR] 
/Users/xyao/github/hadoop-ozone/hadoop-hdds/interface-client/target/generated-sources/java/org/apache/hadoop/hdds/protocol/datanode/proto/XceiverClientProtocolServiceGrpc.java:[20,18]
 package javax.annotation does not exist
[ERROR] -> [Help 1]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] runzhiwang commented on pull request #885: HDDS-3514. Fix memory leak of RaftServerImpl

2020-06-09 Thread GitBox


runzhiwang commented on pull request #885:
URL: https://github.com/apache/hadoop-ozone/pull/885#issuecomment-641648535


   duplicated with https://issues.apache.org/jira/browse/HDDS-3564. No need to 
merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] runzhiwang closed pull request #885: HDDS-3514. Fix memory leak of RaftServerImpl

2020-06-09 Thread GitBox


runzhiwang closed pull request #885:
URL: https://github.com/apache/hadoop-ozone/pull/885


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] codecov-commenter commented on pull request #1047: HDDS-3726. Upload code coverage data to Codecov and enable checks in …

2020-06-09 Thread GitBox


codecov-commenter commented on pull request #1047:
URL: https://github.com/apache/hadoop-ozone/pull/1047#issuecomment-641640111


   # 
[Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1047?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@38e7ca0`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1047/graphs/tree.svg?width=650=150=pr=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1047?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1047   +/-   ##
   =
 Coverage  ?   69.54%   
 Complexity? 9111   
   =
 Files ?  958   
 Lines ?48046   
 Branches  ? 4653   
   =
 Hits  ?33414   
 Misses?12417   
 Partials  ? 2215   
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1047?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1047?src=pr=footer).
 Last update 
[38e7ca0...87ae7bf](https://codecov.io/gh/apache/hadoop-ozone/pull/1047?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3481) SCM ask 31 datanodes to replicate the same container

2020-06-09 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129894#comment-17129894
 ] 

runzhiwang commented on HDDS-3481:
--

bq. What was the cause of the delay? Was the node slow/overloaded?
[~arp] The reason is source datanode replicates too many containers to other 
datanode,  so the source datanode become very slow.

> SCM ask 31 datanodes to replicate the same container
> 
>
> Key: HDDS-3481
> URL: https://issues.apache.org/jira/browse/HDDS-3481
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Blocker
>  Labels: TriagePending
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> *What's the problem ?*
> As the image shows,  scm ask 31 datanodes to replicate container 2037 every 
> 10 minutes from 2020-04-17 23:38:51.  And at 2020-04-18 08:58:52 scm find the 
> replicate num of container 2037 is 12, then it ask 11 datanodes to delete 
> container 2037. 
>  !screenshot-1.png! 
>  !screenshot-2.png! 
> *What's the reason ?*
> scm check whether  (container replicates num + 
> inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it 
> will ask some datanode to replicate the container, and add the action into 
> inflightReplication.get(containerId). The replicate action time out is 10 
> minutes, if action timeout, scm will delete the action from 
> inflightReplication.get(containerId) as the image shows. Then (container 
> replicates num + inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask 
> another datanode to replicate the container.
> Because replicate container cost a long time,  sometimes it cannot finish in 
> 10 minutes, thus 31 datanodes has to replicate the container every 10 
> minutes.  19 of 31 datanodes replicate container from the same source 
> datanode,  it will also cause big pressure on the source datanode and 
> replicate container become slower. Actually it cost 4 hours to finish the 
> first replicate. 
>  !screenshot-4.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #986: HDDS-3476. Use persisted transaction info during OM startup in OM StateMachine.

2020-06-09 Thread GitBox


bharatviswa504 commented on a change in pull request #986:
URL: https://github.com/apache/hadoop-ozone/pull/986#discussion_r437780671



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##
@@ -3027,30 +3017,59 @@ public TermIndex installSnapshot(String leaderId) {
 DBCheckpoint omDBcheckpoint = getDBCheckpointFromLeader(leaderId);
 Path newDBlocation = omDBcheckpoint.getCheckpointLocation();
 
-// Check if current ratis log index is smaller than the downloaded
-// snapshot index. If yes, proceed by stopping the ratis server so that
-// the OM state can be re-initialized. If no, then do not proceed with
-// installSnapshot.
 long lastAppliedIndex = omRatisServer.getLastAppliedTermIndex().getIndex();
-long checkpointSnapshotIndex = omDBcheckpoint.getRatisSnapshotIndex();
-long checkpointSnapshotTermIndex =
-omDBcheckpoint.getRatisSnapshotTerm();
-if (checkpointSnapshotIndex <= lastAppliedIndex) {
-  LOG.error("Failed to install checkpoint from OM leader: {}. The last " +
-  "applied index: {} is greater than or equal to the checkpoint's " +
-  "snapshot index: {}. Deleting the downloaded checkpoint {}", 
leaderId,
-  lastAppliedIndex, checkpointSnapshotIndex,
-  newDBlocation);
-  try {
-FileUtils.deleteFully(newDBlocation);
-  } catch (IOException e) {
-LOG.error("Failed to fully delete the downloaded DB checkpoint {} " +
-"from OM leader {}.", newDBlocation,
-leaderId, e);
+
+// Check if current ratis log index is smaller than the downloaded
+// checkpoint transaction index. If yes, proceed by stopping the ratis
+// server so that the OM state can be re-initialized. If no, then do not
+// proceed with installSnapshot.
+
+OMTransactionInfo omTransactionInfo = null;
+try {
+  // Set new DB location as DB path
+  OzoneConfiguration tempConfig = getConfiguration();
+
+  Path dbDir = newDBlocation.getParent();
+  if (dbDir != null) {
+tempConfig.set(OZONE_OM_DB_DIRS, dbDir.toString());
+  } else {
+LOG.error("Incorrect DB location path {} received from checkpoint.",
+newDBlocation);
+return null;
+  }
+
+  OMMetadataManager tempMetadataMgr =
+  new OmMetadataManagerImpl(configuration);

Review comment:
   Done. Removed extra config.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] flirmnave commented on pull request #898: HDDS-2879. Update return description of OzoneManager#openKey(OmKeyArgs args)

2020-06-09 Thread GitBox


flirmnave commented on pull request #898:
URL: https://github.com/apache/hadoop-ozone/pull/898#issuecomment-641632449


   Thanks @cxorm for reviewing and @hanishakoneru merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1021: HDDS-2665. Implement new Ozone Filesystem scheme ofs://

2020-06-09 Thread GitBox


smengcl commented on a change in pull request #1021:
URL: https://github.com/apache/hadoop-ozone/pull/1021#discussion_r437760431



##
File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneFileSystem.java
##
@@ -0,0 +1,904 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.ozone;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.BlockLocation;
+import org.apache.hadoop.fs.CreateFlag;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileAlreadyExistsException;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.om.exceptions.OMException;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.security.token.Token;
+import org.apache.hadoop.util.Progressable;
+import org.apache.http.client.utils.URIBuilder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.EnumSet;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+import static org.apache.hadoop.fs.ozone.Constants.LISTING_PAGE_SIZE;
+import static org.apache.hadoop.fs.ozone.Constants.OZONE_DEFAULT_USER;
+import static org.apache.hadoop.fs.ozone.Constants.OZONE_USER_DIR;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_URI_DELIMITER;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_OFS_URI_SCHEME;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.BUCKET_NOT_EMPTY;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.VOLUME_NOT_EMPTY;
+
+/**
+ * The minimal Ozone Filesystem implementation.
+ * 
+ * This is a basic version which doesn't extend
+ * KeyProviderTokenIssuer and doesn't include statistics. It can be used
+ * from older hadoop version. For newer hadoop version use the full featured
+ * BasicRootedOzoneFileSystem.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+public class BasicRootedOzoneFileSystem extends FileSystem {
+  static final Logger LOG =
+  LoggerFactory.getLogger(BasicRootedOzoneFileSystem.class);
+
+  /**
+   * The Ozone client for connecting to Ozone server.
+   */
+
+  private URI uri;
+  private String userName;
+  private Path workingDir;
+  private OzoneClientAdapter adapter;
+  private BasicRootedOzoneClientAdapterImpl adapterImpl;
+
+  private static final String URI_EXCEPTION_TEXT =
+  "URL should be one of the following formats: " +
+  "ofs://om-service-id/path/to/key  OR " +
+  "ofs://om-host.example.com/path/to/key  OR " +
+  "ofs://om-host.example.com:5678/path/to/key";
+
+  @Override
+  public void initialize(URI name, Configuration conf) throws IOException {
+super.initialize(name, conf);
+setConf(conf);
+Objects.requireNonNull(name.getScheme(), "No scheme provided in " + name);
+Preconditions.checkArgument(getScheme().equals(name.getScheme()),
+"Invalid scheme provided in " + name);
+
+String authority = name.getAuthority();
+if (authority == null) {
+  // authority is null when fs.defaultFS is not a qualified ofs URI and
+  // ofs:/// is passed to the client. matcher will NPE if authority is null
+   

[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1021: HDDS-2665. Implement new Ozone Filesystem scheme ofs://

2020-06-09 Thread GitBox


smengcl commented on a change in pull request #1021:
URL: https://github.com/apache/hadoop-ozone/pull/1021#discussion_r437760431



##
File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneFileSystem.java
##
@@ -0,0 +1,904 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.ozone;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.BlockLocation;
+import org.apache.hadoop.fs.CreateFlag;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileAlreadyExistsException;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.om.exceptions.OMException;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.security.token.Token;
+import org.apache.hadoop.util.Progressable;
+import org.apache.http.client.utils.URIBuilder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.EnumSet;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+import static org.apache.hadoop.fs.ozone.Constants.LISTING_PAGE_SIZE;
+import static org.apache.hadoop.fs.ozone.Constants.OZONE_DEFAULT_USER;
+import static org.apache.hadoop.fs.ozone.Constants.OZONE_USER_DIR;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_URI_DELIMITER;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_OFS_URI_SCHEME;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.BUCKET_NOT_EMPTY;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.VOLUME_NOT_EMPTY;
+
+/**
+ * The minimal Ozone Filesystem implementation.
+ * 
+ * This is a basic version which doesn't extend
+ * KeyProviderTokenIssuer and doesn't include statistics. It can be used
+ * from older hadoop version. For newer hadoop version use the full featured
+ * BasicRootedOzoneFileSystem.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+public class BasicRootedOzoneFileSystem extends FileSystem {
+  static final Logger LOG =
+  LoggerFactory.getLogger(BasicRootedOzoneFileSystem.class);
+
+  /**
+   * The Ozone client for connecting to Ozone server.
+   */
+
+  private URI uri;
+  private String userName;
+  private Path workingDir;
+  private OzoneClientAdapter adapter;
+  private BasicRootedOzoneClientAdapterImpl adapterImpl;
+
+  private static final String URI_EXCEPTION_TEXT =
+  "URL should be one of the following formats: " +
+  "ofs://om-service-id/path/to/key  OR " +
+  "ofs://om-host.example.com/path/to/key  OR " +
+  "ofs://om-host.example.com:5678/path/to/key";
+
+  @Override
+  public void initialize(URI name, Configuration conf) throws IOException {
+super.initialize(name, conf);
+setConf(conf);
+Objects.requireNonNull(name.getScheme(), "No scheme provided in " + name);
+Preconditions.checkArgument(getScheme().equals(name.getScheme()),
+"Invalid scheme provided in " + name);
+
+String authority = name.getAuthority();
+if (authority == null) {
+  // authority is null when fs.defaultFS is not a qualified ofs URI and
+  // ofs:/// is passed to the client. matcher will NPE if authority is null
+   

[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1021: HDDS-2665. Implement new Ozone Filesystem scheme ofs://

2020-06-09 Thread GitBox


smengcl commented on a change in pull request #1021:
URL: https://github.com/apache/hadoop-ozone/pull/1021#discussion_r437759639



##
File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicRootedOzoneFileSystem.java
##
@@ -0,0 +1,904 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.ozone;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.classification.InterfaceStability;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.BlockLocation;
+import org.apache.hadoop.fs.CreateFlag;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileAlreadyExistsException;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.om.exceptions.OMException;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.security.token.Token;
+import org.apache.hadoop.util.Progressable;
+import org.apache.http.client.utils.URIBuilder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.EnumSet;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+import static org.apache.hadoop.fs.ozone.Constants.LISTING_PAGE_SIZE;
+import static org.apache.hadoop.fs.ozone.Constants.OZONE_DEFAULT_USER;
+import static org.apache.hadoop.fs.ozone.Constants.OZONE_USER_DIR;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_URI_DELIMITER;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_OFS_URI_SCHEME;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.BUCKET_NOT_EMPTY;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.VOLUME_NOT_EMPTY;
+
+/**
+ * The minimal Ozone Filesystem implementation.
+ * 
+ * This is a basic version which doesn't extend
+ * KeyProviderTokenIssuer and doesn't include statistics. It can be used
+ * from older hadoop version. For newer hadoop version use the full featured
+ * BasicRootedOzoneFileSystem.
+ */
+@InterfaceAudience.Private
+@InterfaceStability.Evolving
+public class BasicRootedOzoneFileSystem extends FileSystem {
+  static final Logger LOG =
+  LoggerFactory.getLogger(BasicRootedOzoneFileSystem.class);
+
+  /**
+   * The Ozone client for connecting to Ozone server.
+   */
+
+  private URI uri;
+  private String userName;
+  private Path workingDir;
+  private OzoneClientAdapter adapter;
+  private BasicRootedOzoneClientAdapterImpl adapterImpl;

Review comment:
   `this.adapterImpl = (BasicRootedOzoneClientAdapterImpl) this.adapter;`
   
   this was intended to make the usage of it cleaner





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1021: HDDS-2665. Implement new Ozone Filesystem scheme ofs://

2020-06-09 Thread GitBox


smengcl commented on a change in pull request #1021:
URL: https://github.com/apache/hadoop-ozone/pull/1021#discussion_r437758520



##
File path: 
hadoop-ozone/ozonefs/src/test/java/org/apache/hadoop/fs/ozone/TestRootedOzoneFileSystemWithMocks.java
##
@@ -0,0 +1,115 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs.ozone;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneClient;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.powermock.api.mockito.PowerMockito;
+import org.powermock.core.classloader.annotations.PowerMockIgnore;
+import org.powermock.core.classloader.annotations.PrepareForTest;
+import org.powermock.modules.junit4.PowerMockRunner;
+
+import java.net.URI;
+
+import static org.junit.Assert.assertEquals;
+import static org.mockito.Matchers.eq;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+/**
+ * Ozone File system tests that are light weight and use mocks.
+ */
+@RunWith(PowerMockRunner.class)
+@PrepareForTest({ OzoneClientFactory.class, UserGroupInformation.class })
+@PowerMockIgnore("javax.management.*")
+public class TestRootedOzoneFileSystemWithMocks {

Review comment:
   I removed `TestRootedOzoneFileSystemWithMocks` in HDDS-3767 since 
`TestOzoneFileSystemWithMocks` is also removed. We can restore this later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #1021: HDDS-2665. Implement new Ozone Filesystem scheme ofs://

2020-06-09 Thread GitBox


smengcl commented on a change in pull request #1021:
URL: https://github.com/apache/hadoop-ozone/pull/1021#discussion_r437758264



##
File path: 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestRootedOzoneFileSystem.java
##
@@ -0,0 +1,876 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.ozone;
+
+import org.apache.commons.io.IOUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.apache.hadoop.fs.CommonConfigurationKeysPublic;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.PathIsNotEmptyDirectoryException;
+import org.apache.hadoop.fs.contract.ContractTestUtils;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.OzoneAcl;
+import org.apache.hadoop.ozone.OzoneConsts;
+import org.apache.hadoop.ozone.TestDataUtil;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneKeyDetails;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.VolumeArgs;
+import org.apache.hadoop.ozone.client.protocol.ClientProtocol;
+import org.apache.hadoop.ozone.om.exceptions.OMException;
+import org.apache.hadoop.ozone.security.acl.IAccessAuthorizer.ACLIdentityType;
+import org.apache.hadoop.ozone.security.acl.IAccessAuthorizer.ACLType;
+import org.apache.hadoop.ozone.security.acl.OzoneAclConfig;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Set;
+import java.util.TreeSet;
+import java.util.stream.Collectors;
+
+import static org.apache.hadoop.fs.ozone.Constants.LISTING_PAGE_SIZE;
+import static org.apache.hadoop.ozone.OzoneAcl.AclScope.ACCESS;
+import static org.apache.hadoop.ozone.OzoneConsts.OZONE_URI_DELIMITER;
+import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_OM_ADDRESS_KEY;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.BUCKET_NOT_FOUND;
+import static 
org.apache.hadoop.ozone.om.exceptions.OMException.ResultCodes.VOLUME_NOT_FOUND;
+
+/**
+ * Ozone file system tests that are not covered by contract tests.
+ * TODO: Refactor this and TestOzoneFileSystem later to reduce code 
duplication.
+ */
+public class TestRootedOzoneFileSystem {
+
+  @Rule
+  public Timeout globalTimeout = new Timeout(300_000);
+
+  private OzoneConfiguration conf;
+  private MiniOzoneCluster cluster = null;
+  private FileSystem fs;
+  private RootedOzoneFileSystem ofs;
+  private ObjectStore objectStore;
+  private static BasicRootedOzoneClientAdapterImpl adapter;
+
+  private String volumeName;
+  private String bucketName;
+  // Store path commonly used by tests that test functionality within a bucket
+  private Path testBucketPath;
+  private String rootPath;
+
+  @Before
+  public void init() throws Exception {
+conf = new OzoneConfiguration();
+cluster = MiniOzoneCluster.newBuilder(conf)
+.setNumDatanodes(3)
+.build();
+cluster.waitForClusterToBeReady();
+objectStore = cluster.getClient().getObjectStore();
+
+// create a volume and a bucket to be used by RootedOzoneFileSystem (OFS)
+OzoneBucket bucket = TestDataUtil.createVolumeAndBucket(cluster);
+volumeName = bucket.getVolumeName();
+bucketName = bucket.getName();
+String testBucketStr =
+OZONE_URI_DELIMITER + volumeName + OZONE_URI_DELIMITER + bucketName;
+testBucketPath = new Path(testBucketStr);
+
+rootPath = String.format("%s://%s/",
+OzoneConsts.OZONE_OFS_URI_SCHEME, conf.get(OZONE_OM_ADDRESS_KEY));
+
+// Set the fs.defaultFS and start the filesystem
+

[jira] [Updated] (HDDS-3726) Upload code coverage to Codecov and enable checks in PR workflow of Github Actions

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3726:
-
Labels: pull-request-available  (was: )

> Upload code coverage to Codecov and enable checks in PR workflow of Github 
> Actions
> --
>
> Key: HDDS-3726
> URL: https://issues.apache.org/jira/browse/HDDS-3726
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.6.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>
> HDDS-3170 aggregates code coverage across all components. We need to upload 
> the reports to codecov to be able to keep track of coverage and coverage 
> diffs to be able to tell if a PR does not do a good job on writing unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] vivekratnavel opened a new pull request #1047: HDDS-3726. Upload code coverage data to Codecov and enable checks in …

2020-06-09 Thread GitBox


vivekratnavel opened a new pull request #1047:
URL: https://github.com/apache/hadoop-ozone/pull/1047


   …PR workflow of Github Actions
   
   ## What changes were proposed in this pull request?
   
   - HDDS-3710 aggregates all jacoco code coverage results into one file 
"all.xml". This patch uploads that file to [codecov](https://codecov.io/) to 
keep track / visualize Hadoop Ozone project's coverage data and to compare 
diffs during pull request reviews.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3726
   
   ## How was this patch tested?
   
   This patch was tested by creating pull requests and pushing commits to 
feature branches to test the coverage reported in codecov.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] vivekratnavel merged pull request #1032: HDDS-3682. Recon UI: Add interactive visualization for file size counts

2020-06-09 Thread GitBox


vivekratnavel merged pull request #1032:
URL: https://github.com/apache/hadoop-ozone/pull/1032


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3682) Recon UI: Add interactive visualization for file size counts

2020-06-09 Thread Vivek Ratnavel Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Ratnavel Subramanian updated HDDS-3682:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Recon UI: Add interactive visualization for file size counts
> 
>
> Key: HDDS-3682
> URL: https://issues.apache.org/jira/browse/HDDS-3682
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>
> Include a histogram to interactively view file size counts across each 
> volume/bucket



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3590) Recon UI: Add visualization for file size distribution

2020-06-09 Thread Vivek Ratnavel Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Ratnavel Subramanian resolved HDDS-3590.
--
Resolution: Fixed

> Recon UI: Add visualization for file size distribution 
> ---
>
> Key: HDDS-3590
> URL: https://issues.apache.org/jira/browse/HDDS-3590
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: TriagePending
> Attachments: Recon UI_ Visualizing file size distribution.pdf
>
>
> Recon has an API endpoint to get file size distribution in Ozone. Add 
> visualization in Recon UI for this using histograms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3755) Storage-class support for Ozone

2020-06-09 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129832#comment-17129832
 ] 

Arpit Agarwal commented on HDDS-3755:
-

Thanks for kicking this off [~elek]. This is a pretty detailed writeup. I have 
couple of very high-level thoughts (also commented on the doc):
# Would like to see better defined use-cases, and some discussion on the 
use-cases before we get into the design.
# It may make sense to decouple EC from this feature. I think they are big 
enough to stand independently, and EC can be plugged into the storage-class 
framework later.

Still going through the doc in detail.

> Storage-class support for Ozone
> ---
>
> Key: HDDS-3755
> URL: https://issues.apache.org/jira/browse/HDDS-3755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>
> Use a storage-class as an abstraction which combines replication 
> configuration, container states and transitions. 
> See this thread for the detailed design doc:
>  
> [https://lists.apache.org/thread.html/r1e2a5d5581abe9dd09834305ca65a6807f37bd229a07b8b31bda32ad%40%3Cozone-dev.hadoop.apache.org%3E]
> which is also uploaded to here: 
> https://hackmd.io/4kxufJBOQNaKn7PKFK_6OQ?edit



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1194) Ozone Security Phase -2

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-1194:
-
Target Version/s: 0.7.0  (was: 0.6.0)

> Ozone Security Phase -2 
> 
>
> Key: HDDS-1194
> URL: https://issues.apache.org/jira/browse/HDDS-1194
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Security
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
>  Labels: TriagePending
>
> This is sub-task that tracks work items in the phase-2 of security work items.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1194) Ozone Security Phase -2

2020-06-09 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-1194:
-
Labels: Triaged  (was: TriagePending)

> Ozone Security Phase -2 
> 
>
> Key: HDDS-1194
> URL: https://issues.apache.org/jira/browse/HDDS-1194
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Security
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
>  Labels: Triaged
>
> This is sub-task that tracks work items in the phase-2 of security work items.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2465) S3 Multipart upload failing

2020-06-09 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129809#comment-17129809
 ] 

Bharat Viswanadham edited comment on HDDS-2465 at 6/9/20, 9:18 PM:
---

Need to check again, but this i have seen when using direct Aws s3 API. 

As for now, this issue is not seen with aws cli/S3A file system and other tests 
will move this to next release.
If it is blocking someone, we can move it back to this release.


was (Author: bharatviswa):
Need to check again, but this i have seen when using direct Aws s3 API. 

As for now, this issue is not seen with aws cli and other tests will move this 
to next release.
If it is blocking someone, we can move it back to this release.

> S3 Multipart upload failing
> ---
>
> Key: HDDS-2465
> URL: https://issues.apache.org/jira/browse/HDDS-2465
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: S3
>Reporter: Bharat Viswanadham
>Priority: Critical
>  Labels: TriagePending
> Attachments: MPU.java
>
>
> When I run attached java program, facing below error, during 
> completeMultipartUpload.
> {code:java}
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 
> configuration file found. Using default configuration (logging only errors to 
> the console), or user programmatically provided configurations. Set system 
> property 'log4j2.debug' to show Log4j 2 internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2Exception in thread "main" 
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: 
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
> c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), 
> S3 Extended Request ID: 7tnVbqgc4bgb at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
>  at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
> When I debug it is not the request is not been received by S3Gateway, and I 
> don't see any trace of this in audit log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2465) S3 Multipart upload failing

2020-06-09 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129809#comment-17129809
 ] 

Bharat Viswanadham commented on HDDS-2465:
--

Need to check again, but this i have seen when using direct Aws s3 API. 

As for now, this issue is not seen with aws cli and other tests will move this 
to next release.
If it is blocking someone, we can move it back to this release.

> S3 Multipart upload failing
> ---
>
> Key: HDDS-2465
> URL: https://issues.apache.org/jira/browse/HDDS-2465
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: S3
>Reporter: Bharat Viswanadham
>Priority: Critical
>  Labels: TriagePending
> Attachments: MPU.java
>
>
> When I run attached java program, facing below error, during 
> completeMultipartUpload.
> {code:java}
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2ERROR StatusLogger No Log4j 2 
> configuration file found. Using default configuration (logging only errors to 
> the console), or user programmatically provided configurations. Set system 
> property 'log4j2.debug' to show Log4j 2 internal initialization logging. See 
> https://logging.apache.org/log4j/2.x/manual/configuration.html for 
> instructions on how to configure Log4j 2Exception in thread "main" 
> com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: 
> Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 
> c7b87393-955b-4c93-85f6-b02945e293ca; S3 Extended Request ID: 7tnVbqgc4bgb), 
> S3 Extended Request ID: 7tnVbqgc4bgb at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867) at 
> com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3464)
>  at org.apache.hadoop.ozone.freon.MPU.main(MPU.java:96){code}
> When I debug it is not the request is not been received by S3Gateway, and I 
> don't see any trace of this in audit log.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] hanishakoneru commented on a change in pull request #1004: HDDS-3639. Maintain FileHandle Information in OMMetadataManager.

2020-06-09 Thread GitBox


hanishakoneru commented on a change in pull request #1004:
URL: https://github.com/apache/hadoop-ozone/pull/1004#discussion_r437726086



##
File path: 
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMKeyRenameResponse.java
##
@@ -93,6 +93,15 @@ public void addToDBBatch(OMMetadataManager omMetadataManager,
   omMetadataManager.getKeyTable().putWithBatch(batchOperation,

Review comment:
   Let me elaborate on the scenario I am talking about:
 Trxn n1 : Create Key - key1
 Trxn n2 : Commit Key - key1 (adds n1 -> key1 entry to KeyID table)
 Trxn n3 : Rename key1 -> key2 (update n1 -> key2 in KeyID table)
   
   Now lets say we replay transactions from n1. 
 Replay Trxn n1 : Create Key - key1 (key1 does not exist so this 
transaction is replayed)
 Replay Trxn n2 : Commit Key - key1 (this again updates n1 -> key1 in KeyID 
table)
 Replay Trxn n3 : Since key2 exists, we figure out this is replay and 
execute 
 deleteFromKeyOnly() logic in addToBatch.
   
   So finally after replay, we have n1 in KeyID table pointing to key1 instead 
of key2.
   
   > I also noticed that the sequence of delete and newkey addition was 
reversed before this change. Not sure how this worked.
   
   The order does not matter here as they are part of batch operation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2993) KeyManagerImpl#getFileStatus shouldn't always return null for permission, owner and group

2020-06-09 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng reassigned HDDS-2993:


Assignee: Siyao Meng

> KeyManagerImpl#getFileStatus shouldn't always return null for permission, 
> owner and group
> -
>
> Key: HDDS-2993
> URL: https://issues.apache.org/jira/browse/HDDS-2993
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager, Security
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: Triaged
>
> The {{getFileStatus}} API always returns null for permisson, owner and group 
> at the moment.
> From the 
> [code|https://github.com/apache/hadoop-ozone/blob/5950224c735c994d0acfaada87e3eef6c306299e/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java#L1689-L1692]:
> {code}
>   if (fileKeyInfo != null) {
> // this is a file
> return new OzoneFileStatus(fileKeyInfo, scmBlockSize, false);
>   }
> {code}
> into the 
> [constructor|https://github.com/apache/hadoop-ozone/blob/2e9265864af3b1d520dc7cdca3698d306f28cd14/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java#L41-L45]:
> {code}
>   public OzoneFileStatus(OmKeyInfo key, long blockSize, boolean isDirectory) {
> super(key.getDataSize(), isDirectory, key.getFactor().getNumber(),
> blockSize, key.getModificationTime(), getPath(key.getKeyName()));
> keyInfo = key;
>   }
> {code}
> into 
> [super|https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L109-L115]
>  (hadoop-common 3.2.0 jar):
> {code}
>   //We should deprecate this soon?
>   public FileStatus(long length, boolean isdir, int block_replication,
> long blocksize, long modification_time, Path path) {
> this(length, isdir, block_replication, blocksize, modification_time,
>  0, null, null, null, path);
>   }
> {code}
> The constructor 
> [params|https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L117-L127]:
> {code}
>   /**
>* Constructor for file systems on which symbolic links are not supported
>*/
>   public FileStatus(long length, boolean isdir,
> int block_replication,
> long blocksize, long modification_time, long access_time,
> FsPermission permission, String owner, String group, 
> Path path) {
> this(length, isdir, block_replication, blocksize, modification_time,
>  access_time, permission, owner, group, null, path);
>   }
> {code}
> You can see that the constructor used for Ozone's getFileStatus is always 
> filling null for permission/owner/group.
> We might want to fix this.
> CC [~xyao] [~aengineer]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



  1   2   3   4   >