[jira] [Updated] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3104: - Labels: pull-request-available (was: ) > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-3104.patch, > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] swagle opened a new pull request #660: HDDS-3104. Integration test crashes due to critical error in datanode.
swagle opened a new pull request #660: HDDS-3104. Integration test crashes due to critical error in datanode. URL: https://github.com/apache/hadoop-ozone/pull/660 ## What changes were proposed in this pull request? Created a flag to tell StateContext that shutDown was called and not to overreact! ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3104 ## How was this patch tested? Verified by running TestContainerStateMachineFailureOnRead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056665#comment-17056665 ] Siddharth Wagle commented on HDDS-3104: --- Creating a PR anyways to run tests. > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDDS-3104.patch, > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056664#comment-17056664 ] Siddharth Wagle edited comment on HDDS-3104 at 3/11/20, 5:42 AM: - [~adoroszlai] I think this happens because: 1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets the StateContext.state = SHUTDOWN 2. The DatanodeStateMachine is calling StateContext.execute because it read the stale state of the state 3. StateContext.excute will set shutDownOnError to true when it sees the new state even though there was no error So, I have a proposed patch for this, attaching it here before making a PR, would like to know your thoughts. was (Author: swagle): [~adoroszlai] I think this happens because: 1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets the StateContext.state = SHUTDOWN 2. The DatanodeStateMachine is calling StateContext.execute because it read the stale state of the state 3. StateContext.excute will set shutDownOnError to true when it sees this state even though there was no error So, I have a proposed patch for this, attaching it here before making a PR, would like to know your thoughts. > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDDS-3104.patch, > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDDS-3104: -- Attachment: HDDS-3104.patch > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDDS-3104.patch, > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056664#comment-17056664 ] Siddharth Wagle edited comment on HDDS-3104 at 3/11/20, 5:40 AM: - [~adoroszlai] I think this happens because: 1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets the StateContext.state = SHUTDOWN 2. The DatanodeStateMachine is calling StateContext.execute because it read the stale state of the state 3. StateContext.excute will set shutDownOnError to true when it sees this state even though there was no error So, I have a proposed patch for this, attaching it here before making a PR, would like to know your thoughts. was (Author: swagle): [~adoroszlai] I think this happens because: 1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets the StateContext.state = SHUTDOWN 2. The DatanodeStateMachine is calling StateContext.execute because the stale state it read 3. StateContext.excute will set shutDownOnError to true when it sees this state even though there was no error So, I have a proposed patch for this, attaching it here before making a PR, would like to know your thoughts. > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDDS-3104.patch, > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056664#comment-17056664 ] Siddharth Wagle commented on HDDS-3104: --- [~adoroszlai] I think this happens because: 1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets the StateContext.state = SHUTDOWN 2. The DatanodeStateMachine is calling StateContext.execute because the stale state it read 3. StateContext.excute will set shutDownOnError to true when it sees this state even though there was no error So, I have a proposed patch for this, attaching it here before making a PR, would like to know your thoughts. > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Attachments: > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Description: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log(Refer to the amlog for details), we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. was: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log(), we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: amlog, stdout > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log(Refer to the amlog for details), we found that the > time of over 40 minutes is AM writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the log > on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or > ozone one by one > ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). > The problem occurs when the number of task maps is large. > Currently, each flush operation in ozone generates a new chunk file in > real time on the disk. This approach is not very efficient at the moment. For > this we can refer to the implementation of HDFS flush. Instead of writing to > disk each time flush writes the contents of the buffer to the datanode's OS > buffer. In the first
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Description: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log(Refer to the amlog for details), we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. was: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log(Refer to the amlog for details), we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: amlog, stdout > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log(Refer to the amlog for details), we found that the > time of over 40 minutes is AM writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the > log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS > or ozone one by one > ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). > The problem occurs when the number of task maps is large. > Currently, each flush operation in ozone generates a new chunk file in > real time on the disk. This approach is not very efficient at the moment. For > this we can refer to the implementation of HDFS flush. Instead of writing to > disk each time flush writes the contents of the buffer to the
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Description: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log(), we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. was: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log, we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: amlog, stdout > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log(), we found that the time of over 40 minutes is AM > writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the log > on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or > ozone one by one > ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). > The problem occurs when the number of task maps is large. > Currently, each flush operation in ozone generates a new chunk file in > real time on the disk. This approach is not very efficient at the moment. For > this we can refer to the implementation of HDFS flush. Instead of writing to > disk each time flush writes the contents of the buffer to the datanode's OS > buffer. In the first place, we need to ensure that this content can be read > by
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Attachment: (was: syslog) > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: amlog, stdout > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log(), we found that the time of over 40 minutes is AM > writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the log > on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or > ozone one by one > ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). > The problem occurs when the number of task maps is large. > Currently, each flush operation in ozone generates a new chunk file in > real time on the disk. This approach is not very efficient at the moment. For > this we can refer to the implementation of HDFS flush. Instead of writing to > disk each time flush writes the contents of the buffer to the datanode's OS > buffer. In the first place, we need to ensure that this content can be read > by other datanodes. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Attachment: amlog > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: amlog, stdout > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log(), we found that the time of over 40 minutes is AM > writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the log > on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or > ozone one by one > ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). > The problem occurs when the number of task maps is large. > Currently, each flush operation in ozone generates a new chunk file in > real time on the disk. This approach is not very efficient at the moment. For > this we can refer to the implementation of HDFS flush. Instead of writing to > disk each time flush writes the contents of the buffer to the datanode's OS > buffer. In the first place, we need to ensure that this content can be read > by other datanodes. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Description: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log, we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. was: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //代码占位符 20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job: map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job: map 93% reduce 30%20/03/05 14:43:08 INFO mapreduce.Job: map 93% reduce 31%20/03/05 14:43:11 INFO mapreduce.Job: map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job: map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job: map 96% reduce 31%20/03/05 14:43:20 INFO mapreduce.Job: map 97% reduce 32%20/03/05 14:43:24 INFO mapreduce.Job: map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job: map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33%20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System Counters FILE: Number of bytes read=84602 FILE: Number of bytes written=162626320 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write operations=1002 Job Counters Killed map tasks=1 Launched map tasks=1000 Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 Total time spent by all maps in occupied slots (ms)=149515400 Total time spent by all reduces in occupied slots (ms)=449288 Total time spent by all map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map tasks=153103769600 Total megabyte-milliseconds taken by all reduce tasks=460070912 {code} > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: stdout, syslog > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log, we found that the time of over 40 minutes is AM > writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the log > on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or > ozone one by one >
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Description: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log, we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. was: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //Refer to the attachment: stdout 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully{code} By looking at AM's log, we found that the time of over 40 minutes is AM writing a task log into ozone. At present, after MR execution, the Task information is recorded into the log on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or ozone one by one ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). The problem occurs when the number of task maps is large. Currently, each flush operation in ozone generates a new chunk file in real time on the disk. This approach is not very efficient at the moment. For this we can refer to the implementation of HDFS flush. Instead of writing to disk each time flush writes the contents of the buffer to the datanode's OS buffer. In the first place, we need to ensure that this content can be read by other datanodes. > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: stdout, syslog > > > Background: > When we execute mapreduce in the ozone, we find that the task will be > stuck for a long time after the completion of Map and Reduce. The log is as > follows: > {code:java} > //Refer to the attachment: stdout > 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% > 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% > 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully{code} > By looking at AM's log, we found that the time of over 40 minutes is AM > writing a task log into ozone. > At present, after MR execution, the Task information is recorded into the log > on HDFS or ozone by AM. Moreover, the task information is flush to HDFS or > ozone one by one > ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]). > The problem occurs when the number of task maps is large. > Currently, each flush operation in ozone generates a new chunk file in > real time on the disk. This approach is not very efficient at the moment. For > this we can refer to the implementation of HDFS flush. Instead of writing to > disk each time flush writes the contents of the buffer to the datanode's OS > buffer. In the first place, we need to ensure that this content can be read >
[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests
[ https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-3152: - Fix Version/s: 0.6.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Reduce number of chunkwriter threads in integration tests > - > > Key: HDDS-3152 > URL: https://issues.apache.org/jira/browse/HDDS-3152 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Integration tests run multiple datanodes in the same JVM. Each datanode > comes with 60 chunk writer threads by default (may be decreased in > HDDS-3053). This makes thread dumps (eg. produced by > {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there > may be 300+ such threads. > Since integration tests are generally run with a single disk which is even > shared among the datanodes, a few threads per datanode should be enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests
bharatviswa504 commented on issue #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests URL: https://github.com/apache/hadoop-ozone/pull/657#issuecomment-597430639 Thank You @adoroszlai for the contribution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sonarcloud[bot] removed a comment on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text
sonarcloud[bot] removed a comment on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text URL: https://github.com/apache/hadoop-ozone/pull/588#issuecomment-596883379 SonarCloud Quality Gate failed. [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG) [2 Bugs](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY) (and [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT) [1 Security Hotspot](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT) to review) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL) [16 Code Smells](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL) [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list) [0.0% Coverage](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list) [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list) [0.0% Duplication](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] sonarcloud[bot] commented on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text
sonarcloud[bot] commented on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text URL: https://github.com/apache/hadoop-ozone/pull/588#issuecomment-597430623 SonarCloud Quality Gate failed. [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG) [2 Bugs](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY) (and [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT) [1 Security Hotspot](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT) to review) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL) [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL) [17 Code Smells](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL) [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list) [0.0% Coverage](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list) [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list) [0.0% Duplication](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests
bharatviswa504 merged pull request #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests URL: https://github.com/apache/hadoop-ozone/pull/657 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3100) Fix TestDeadNodeHandler.
[ https://issues.apache.org/jira/browse/HDDS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-3100: - Fix Version/s: 0.6.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Fix TestDeadNodeHandler. > > > Key: HDDS-3100 > URL: https://issues.apache.org/jira/browse/HDDS-3100 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
bharatviswa504 commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler. URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597429367 Thank You @avijayanhwx for the contribution and @adoroszlai for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #655: HDDS-3100. Fix TestDeadNodeHandler.
bharatviswa504 merged pull request #655: HDDS-3100. Fix TestDeadNodeHandler. URL: https://github.com/apache/hadoop-ozone/pull/655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Attachment: syslog > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: stdout, syslog > > > Background: > When we execute mapreduce in the ozone, we find that the task will be stuck > for a long time after the completion of Map and Reduce. The log is as follows: > {code:java} > //代码占位符 > 20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:03 > INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job: > map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job: map 93% reduce > 30%20/03/05 14:43:08 INFO mapreduce.Job: map 93% reduce 31%20/03/05 14:43:11 > INFO mapreduce.Job: map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job: > map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job: map 96% reduce > 31%20/03/05 14:43:20 INFO mapreduce.Job: map 97% reduce 32%20/03/05 14:43:24 > INFO mapreduce.Job: map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job: > map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce > 33%20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%20/03/05 > 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System > Counters FILE: Number of bytes read=84602 FILE: Number of bytes > written=162626320 FILE: Number of read operations=0 FILE: Number of large > read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes > read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read > operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write > operations=1002 Job Counters Killed map tasks=1 Launched map tasks=1000 > Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 > Total time spent by all maps in occupied slots (ms)=149515400 Total time > spent by all reduces in occupied slots (ms)=449288 Total time spent by all > map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total > vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds > taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map > tasks=153103769600 Total megabyte-milliseconds taken by all reduce > tasks=460070912 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Attachment: stdout > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > Attachments: stdout > > > Background: > When we execute mapreduce in the ozone, we find that the task will be stuck > for a long time after the completion of Map and Reduce. The log is as follows: > {code:java} > //代码占位符 > 20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:03 > INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job: > map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job: map 93% reduce > 30%20/03/05 14:43:08 INFO mapreduce.Job: map 93% reduce 31%20/03/05 14:43:11 > INFO mapreduce.Job: map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job: > map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job: map 96% reduce > 31%20/03/05 14:43:20 INFO mapreduce.Job: map 97% reduce 32%20/03/05 14:43:24 > INFO mapreduce.Job: map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job: > map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce > 33%20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%20/03/05 > 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System > Counters FILE: Number of bytes read=84602 FILE: Number of bytes > written=162626320 FILE: Number of read operations=0 FILE: Number of large > read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes > read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read > operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write > operations=1002 Job Counters Killed map tasks=1 Launched map tasks=1000 > Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 > Total time spent by all maps in occupied slots (ms)=149515400 Total time > spent by all reduces in occupied slots (ms)=449288 Total time spent by all > map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total > vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds > taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map > tasks=153103769600 Total megabyte-milliseconds taken by all reduce > tasks=460070912 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Description: Background: When we execute mapreduce in the ozone, we find that the task will be stuck for a long time after the completion of Map and Reduce. The log is as follows: {code:java} //代码占位符 20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job: map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job: map 93% reduce 30%20/03/05 14:43:08 INFO mapreduce.Job: map 93% reduce 31%20/03/05 14:43:11 INFO mapreduce.Job: map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job: map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job: map 96% reduce 31%20/03/05 14:43:20 INFO mapreduce.Job: map 97% reduce 32%20/03/05 14:43:24 INFO mapreduce.Job: map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job: map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33%20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System Counters FILE: Number of bytes read=84602 FILE: Number of bytes written=162626320 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write operations=1002 Job Counters Killed map tasks=1 Launched map tasks=1000 Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 Total time spent by all maps in occupied slots (ms)=149515400 Total time spent by all reduces in occupied slots (ms)=449288 Total time spent by all map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map tasks=153103769600 Total megabyte-milliseconds taken by all reduce tasks=460070912 {code} > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > > Background: > When we execute mapreduce in the ozone, we find that the task will be stuck > for a long time after the completion of Map and Reduce. The log is as follows: > {code:java} > //代码占位符 > 20/03/05 14:43:03 INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:03 > INFO mapreduce.Job: map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job: > map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job: map 93% reduce > 30%20/03/05 14:43:08 INFO mapreduce.Job: map 93% reduce 31%20/03/05 14:43:11 > INFO mapreduce.Job: map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job: > map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job: map 96% reduce > 31%20/03/05 14:43:20 INFO mapreduce.Job: map 97% reduce 32%20/03/05 14:43:24 > INFO mapreduce.Job: map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job: > map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce > 33%20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%20/03/05 > 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed > successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System > Counters FILE: Number of bytes read=84602 FILE: Number of bytes > written=162626320 FILE: Number of read operations=0 FILE: Number of large > read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes > read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read > operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write > operations=1002 Job Counters Killed map tasks=1 Launched map tasks=1000 > Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 > Total time spent by all maps in occupied slots (ms)=149515400 Total time > spent by all reduces in occupied slots (ms)=449288 Total time spent by all > map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total > vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds > taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map > tasks=153103769600 Total megabyte-milliseconds taken by all reduce > tasks=460070912 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
[ https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056619#comment-17056619 ] Siddharth Wagle commented on HDDS-3133: --- But getFileId is only defined in HdfsFileStatus and OzoneFileStatus implements FileStatus On Tue, Mar 10, 2020, 8:27 PM Mukul Kumar Singh (Jira) > Add export objectIds in Ozone as FileIds to allow LLAP to cache the files > - > > Key: HDDS-3133 > URL: https://issues.apache.org/jira/browse/HDDS-3133 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem, Ozone Manager >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > > Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds > need to be exported as fileIds to allow the caching to happen effectively. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.
[ https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingchao zhao updated HDDS-3155: Issue Type: Improvement (was: Bug) > Improved ozone flush implementation to make it faster. > -- > > Key: HDDS-3155 > URL: https://issues.apache.org/jira/browse/HDDS-3155 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: mingchao zhao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
[ https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056616#comment-17056616 ] Mukul Kumar Singh commented on HDDS-3133: - yes, Ozone should implement something like OzoneFileStatus where it should also export fileIDs. Currently Ozone does not export something like fileIds. > Add export objectIds in Ozone as FileIds to allow LLAP to cache the files > - > > Key: HDDS-3133 > URL: https://issues.apache.org/jira/browse/HDDS-3133 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem, Ozone Manager >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > > Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds > need to be exported as fileIds to allow the caching to happen effectively. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3155) Improved ozone flush implementation to make it faster.
mingchao zhao created HDDS-3155: --- Summary: Improved ozone flush implementation to make it faster. Key: HDDS-3155 URL: https://issues.apache.org/jira/browse/HDDS-3155 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: mingchao zhao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] mukul1987 commented on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text
mukul1987 commented on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text URL: https://github.com/apache/hadoop-ozone/pull/588#issuecomment-597422675 Thanks for the review @bharatviswa504 , addressed comments in the next path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3116) Datanode sometimes fails to start with NPE when starting Ratis xceiver server
[ https://issues.apache.org/jira/browse/HDDS-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056541#comment-17056541 ] Hanisha Koneru commented on HDDS-3116: -- There is a lot of circular dependency between OzoneContainer, XceiverServerRatis and StateContext. AFAICS it might not be a trivial change to fix this. Not sure if adding the synchronization will resolve the issue altogether. I tried reproing but couldn't. Will try reproing on a docker cluster. > Datanode sometimes fails to start with NPE when starting Ratis xceiver server > - > > Key: HDDS-3116 > URL: https://issues.apache.org/jira/browse/HDDS-3116 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Blocker > Labels: pull-request-available > Attachments: full_logs.txt > > Time Spent: 10m > Remaining Estimate: 0h > > While working on a network Topology test (HDDS-3084) which does the following: > 1. Start a cluster with 6 DNs and 2 racks. > 2. Create a volume, bucket and a single key. > 3. Stop one rack of hosts using "docker-compose down" > 4. Read the data from the single key > 5. Start the 3 down hosts > 6. Stop the other 3 hosts > 7. Attempt to read the key again. > At step 5 I sometimes see this stack trace in one of the DNs and it fails to > full come up: > {code} > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO > ozoneimpl.OzoneContainer: Attempting to start container services. > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO > ozoneimpl.OzoneContainer: Background container scanner has been disabled. > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO > ratis.XceiverServerRatis: Starting XceiverServerRatis > 8c1178dd-c44d-49d1-b899-cc3e40ae8f23 at port 9858 > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] WARN > statemachine.EndpointStateMachine: Unable to communicate to SCM server at > scm:9861 for past 15000 seconds. > java.io.IOException: java.lang.NullPointerException > at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) > at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:418) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:232) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.sendPipelineReport(XceiverServerRatis.java:757) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.notifyGroupAdd(XceiverServerRatis.java:739) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.initialize(ContainerStateMachine.java:218) > at > org.apache.ratis.server.impl.ServerState.initStatemachine(ServerState.java:160) > at org.apache.ratis.server.impl.ServerState.(ServerState.java:112) > at > org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:112) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ... 3 more > {code} > The DN does not recover from this automatically, although I confirmed that a > full cluster restart fixed it (docker-compose stop; docker-compose start). I > will try to confirm if a restart of the stuck DN would fix it or not too. -- This message
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390670650 ## File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java ## @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException { } } + @Override + public byte[] getIfExist(byte[] key) throws IOException { +try { + // RocksDB#keyMayExist + // If the key definitely does not exist in the database, then this + // method returns false, else true. + rdbMetrics.incNumDBKeyGetIfExistChecks(); + StringBuilder outValue = new StringBuilder(); + boolean keyMayExist = db.keyMayExist(handle, key, outValue); + if (keyMayExist) { +// Not using out value from string builder, as that is causing +// IllegalArgumentException during protobuf parsing. Review comment: ``` @Override public byte[] get(byte[] key) throws IOException { try { // RocksDB#keyMayExist // If the key definitely does not exist in the database, then this // method returns false, else true. rdbMetrics.incNumDBKeyIfExistChecks(); StringBuilder outValue = new StringBuilder(); boolean keyMayExist = db.keyMayExist(handle, key, outValue); if (keyMayExist) { byte[] val; if (outValue.length() > 0) { val = outValue.toString().getBytes(UTF_8); } else { val = db.get(handle, key); } if (val != null) { rdbMetrics.incNumDBKeyIfExistMisses(); } return val; } return null; } catch (RocksDBException e) { throw toIOException( "Error in accessing DB. ", e); } } ``` I tried with the above code, getting IllegalException during parsing. So, for now, removed not to use outVal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390670650 ## File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java ## @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException { } } + @Override + public byte[] getIfExist(byte[] key) throws IOException { +try { + // RocksDB#keyMayExist + // If the key definitely does not exist in the database, then this + // method returns false, else true. + rdbMetrics.incNumDBKeyGetIfExistChecks(); + StringBuilder outValue = new StringBuilder(); + boolean keyMayExist = db.keyMayExist(handle, key, outValue); + if (keyMayExist) { +// Not using out value from string builder, as that is causing +// IllegalArgumentException during protobuf parsing. Review comment: @Override public byte[] get(byte[] key) throws IOException { try { // RocksDB#keyMayExist // If the key definitely does not exist in the database, then this // method returns false, else true. rdbMetrics.incNumDBKeyIfExistChecks(); StringBuilder outValue = new StringBuilder(); boolean keyMayExist = db.keyMayExist(handle, key, outValue); if (keyMayExist) { byte[] val; if (outValue.length() > 0) { val = outValue.toString().getBytes(UTF_8); } else { val = db.get(handle, key); } if (val != null) { rdbMetrics.incNumDBKeyIfExistMisses(); } return val; } return null; } catch (RocksDBException e) { throw toIOException( "Error in accessing DB. ", e); } } I tried with the above code, getting IllegalException during parsing. So, for now, removed not to use outVal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390670650 ## File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java ## @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException { } } + @Override + public byte[] getIfExist(byte[] key) throws IOException { +try { + // RocksDB#keyMayExist + // If the key definitely does not exist in the database, then this + // method returns false, else true. + rdbMetrics.incNumDBKeyGetIfExistChecks(); + StringBuilder outValue = new StringBuilder(); + boolean keyMayExist = db.keyMayExist(handle, key, outValue); + if (keyMayExist) { +// Not using out value from string builder, as that is causing +// IllegalArgumentException during protobuf parsing. Review comment: @Override ``` public byte[] get(byte[] key) throws IOException { try { // RocksDB#keyMayExist // If the key definitely does not exist in the database, then this // method returns false, else true. rdbMetrics.incNumDBKeyIfExistChecks(); StringBuilder outValue = new StringBuilder(); boolean keyMayExist = db.keyMayExist(handle, key, outValue); if (keyMayExist) { byte[] val; if (outValue.length() > 0) { val = outValue.toString().getBytes(UTF_8); } else { val = db.get(handle, key); } if (val != null) { rdbMetrics.incNumDBKeyIfExistMisses(); } return val; } return null; } catch (RocksDBException e) { throw toIOException( "Error in accessing DB. ", e); } } ``` I tried with the above code, getting IllegalException during parsing. So, for now, removed not to use outVal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390669860 ## File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java ## @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException { } } + @Override + public byte[] getIfExist(byte[] key) throws IOException { +try { + // RocksDB#keyMayExist + // If the key definitely does not exist in the database, then this + // method returns false, else true. + rdbMetrics.incNumDBKeyGetIfExistChecks(); + StringBuilder outValue = new StringBuilder(); + boolean keyMayExist = db.keyMayExist(handle, key, outValue); + if (keyMayExist) { +// Not using out value from string builder, as that is causing +// IllegalArgumentException during protobuf parsing. Review comment: Yes, using that is causing the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3100) Fix TestDeadNodeHandler.
[ https://issues.apache.org/jira/browse/HDDS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-3100: Status: Patch Available (was: Open) > Fix TestDeadNodeHandler. > > > Key: HDDS-3100 > URL: https://issues.apache.org/jira/browse/HDDS-3100 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and dump datanode segment file to pritable text
bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and dump datanode segment file to pritable text URL: https://github.com/apache/hadoop-ozone/pull/588#discussion_r390665564 ## File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/segmentparser/GenericParser.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.ozone.segmentparser; + +import org.apache.hadoop.hdds.cli.HddsVersionProvider; +import picocli.CommandLine; + +import java.util.concurrent.Callable; + +/** + * Command line utility to parse and dump any generic ratis segment file. + */ +@CommandLine.Command( +name = "generic", +description = "dump generic ratis segment file", +mixinStandardHelpOptions = true, +versionProvider = HddsVersionProvider.class) +public class GenericParser extends BaseLogParser implements Callable { Review comment: Can we rename this as GenericRatisLogParser? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and dump datanode segment file to pritable text
bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and dump datanode segment file to pritable text URL: https://github.com/apache/hadoop-ozone/pull/588#discussion_r390665439 ## File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/segmentparser/DatanodeParser.java ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.ozone.segmentparser; + +import org.apache.hadoop.hdds.cli.HddsVersionProvider; +import org.apache.hadoop.ozone.container.common.transport.server +.ratis.ContainerStateMachine; +import org.apache.ratis.proto.RaftProtos.StateMachineLogEntryProto; +import org.apache.ratis.protocol.RaftGroupId; +import org.apache.ratis.thirdparty.com.google.protobuf.ByteString; +import picocli.CommandLine; + +import java.util.concurrent.Callable; + +/** + * Command line utility to parse and dump a datanode ratis segment file. + */ +@CommandLine.Command( +name = "datanode", +description = "dump datanode segment file", +mixinStandardHelpOptions = true, +versionProvider = HddsVersionProvider.class) +public class DatanodeParser extends BaseLogParser implements Callable { Review comment: Can we name this class as DatanodeRatisLogParser This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File
avijayanhwx commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390664487 ## File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java ## @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException { } } + @Override + public byte[] getIfExist(byte[] key) throws IOException { +try { + // RocksDB#keyMayExist + // If the key definitely does not exist in the database, then this + // method returns false, else true. + rdbMetrics.incNumDBKeyGetIfExistChecks(); + StringBuilder outValue = new StringBuilder(); + boolean keyMayExist = db.keyMayExist(handle, key, outValue); + if (keyMayExist) { +// Not using out value from string builder, as that is causing +// IllegalArgumentException during protobuf parsing. Review comment: Does this mean we cannot use the outValue that we pass in? If 'keyMayExist' returns true, and the value is indeed present in the block cache, I believe the outValue will have the data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3120) Freon work with OM HA
[ https://issues.apache.org/jira/browse/HDDS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-3120: - Fix Version/s: 0.6.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Freon work with OM HA > - > > Key: HDDS-3120 > URL: https://issues.apache.org/jira/browse/HDDS-3120 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Make Freon commands work with OM HA -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA.
bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649#issuecomment-597342135 Thank You @adoroszlai for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #649: HDDS-3120. Freon work with OM HA.
bharatviswa504 merged pull request #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3154) Intermittent failure in Test2WayCommitInRatis
[ https://issues.apache.org/jira/browse/HDDS-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3154: --- Attachment: org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis.txt org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis-output.txt > Intermittent failure in Test2WayCommitInRatis > - > > Key: HDDS-3154 > URL: https://issues.apache.org/jira/browse/HDDS-3154 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Priority: Major > Attachments: > org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis-output.txt, > org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis.txt > > > Test2WayCommitInRatis may fail due to {{TimeoutIOException: Request #8 > timeout 3s}} from Ratis while closing the container. [~shashikant], can you > please take a look? > Logs with RaftClient set to debug level attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3154) Intermittent failure in Test2WayCommitInRatis
Attila Doroszlai created HDDS-3154: -- Summary: Intermittent failure in Test2WayCommitInRatis Key: HDDS-3154 URL: https://issues.apache.org/jira/browse/HDDS-3154 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Attila Doroszlai Attachments: org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis-output.txt, org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis.txt Test2WayCommitInRatis may fail due to {{TimeoutIOException: Request #8 timeout 3s}} from Ratis while closing the container. [~shashikant], can you please take a look? Logs with RaftClient set to debug level attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.
bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390608371 ## File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/BaseFreonGenerator.java ## @@ -374,53 +369,37 @@ public String generateObjectName(long counter) { /** * Create missing target volume/bucket. */ - public void ensureVolumeAndBucketExist(OzoneConfiguration ozoneConfiguration, + public void ensureVolumeAndBucketExist(OzoneClient rpcClient, String volumeName, String bucketName) throws IOException { -try (OzoneClient rpcClient = OzoneClientFactory -.getRpcClient(ozoneConfiguration)) { +OzoneVolume volume; +ensureVolumeExists(rpcClient, volumeName); +volume = rpcClient.getObjectStore().getVolume(volumeName); - OzoneVolume volume = null; - try { -volume = rpcClient.getObjectStore().getVolume(volumeName); - } catch (OMException ex) { -if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) { - rpcClient.getObjectStore().createVolume(volumeName); - volume = rpcClient.getObjectStore().getVolume(volumeName); -} else { - throw ex; -} - } - - try { -volume.getBucket(bucketName); - } catch (OMException ex) { -if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) { - volume.createBucket(bucketName); -} else { - throw ex; -} +try { + volume.getBucket(bucketName); +} catch (OMException ex) { + if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) { +volume.createBucket(bucketName); + } else { +throw ex; } } + } /** * Create missing target volume. */ public void ensureVolumeExists( - OzoneConfiguration ozoneConfiguration, + OzoneClient rpcClient, String volumeName) throws IOException { -try (OzoneClient rpcClient = OzoneClientFactory -.getRpcClient(ozoneConfiguration)) { - - try { -rpcClient.getObjectStore().getVolume(volumeName); - } catch (OMException ex) { -if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) { - rpcClient.getObjectStore().createVolume(volumeName); -} +try { + rpcClient.getObjectStore().getVolume(volumeName); +} catch (OMException ex) { + if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) { +rpcClient.getObjectStore().createVolume(volumeName); Review comment: Nice catch. But that should fail in the next steps in our case like during getVolume/CreateBucket. But it is better to throw an exception from here. Addressed in the latest commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.
adoroszlai commented on a change in pull request #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390599201 ## File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/BaseFreonGenerator.java ## @@ -374,53 +369,37 @@ public String generateObjectName(long counter) { /** * Create missing target volume/bucket. */ - public void ensureVolumeAndBucketExist(OzoneConfiguration ozoneConfiguration, + public void ensureVolumeAndBucketExist(OzoneClient rpcClient, String volumeName, String bucketName) throws IOException { -try (OzoneClient rpcClient = OzoneClientFactory -.getRpcClient(ozoneConfiguration)) { +OzoneVolume volume; +ensureVolumeExists(rpcClient, volumeName); +volume = rpcClient.getObjectStore().getVolume(volumeName); - OzoneVolume volume = null; - try { -volume = rpcClient.getObjectStore().getVolume(volumeName); - } catch (OMException ex) { -if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) { - rpcClient.getObjectStore().createVolume(volumeName); - volume = rpcClient.getObjectStore().getVolume(volumeName); -} else { - throw ex; -} - } - - try { -volume.getBucket(bucketName); - } catch (OMException ex) { -if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) { - volume.createBucket(bucketName); -} else { - throw ex; -} +try { + volume.getBucket(bucketName); +} catch (OMException ex) { + if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) { +volume.createBucket(bucketName); + } else { +throw ex; } } + } /** * Create missing target volume. */ public void ensureVolumeExists( - OzoneConfiguration ozoneConfiguration, + OzoneClient rpcClient, String volumeName) throws IOException { -try (OzoneClient rpcClient = OzoneClientFactory -.getRpcClient(ozoneConfiguration)) { - - try { -rpcClient.getObjectStore().getVolume(volumeName); - } catch (OMException ex) { -if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) { - rpcClient.getObjectStore().createVolume(volumeName); -} +try { + rpcClient.getObjectStore().getVolume(volumeName); +} catch (OMException ex) { + if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) { +rpcClient.getObjectStore().createVolume(volumeName); Review comment: Should `throw ex` in `else` branch, otherwise volume creation fails silently and will run into NPE elsewhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler. URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597303328 Thank you for the review @adoroszlai. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2989) Intermittent timeout in TestBlockManager
[ https://issues.apache.org/jira/browse/HDDS-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2989: - Labels: pull-request-available (was: ) > Intermittent timeout in TestBlockManager > > > Key: HDDS-2989 > URL: https://issues.apache.org/jira/browse/HDDS-2989 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/runs/430663688} > 2020-02-06T21:44:53.5319531Z [ERROR] Tests run: 9, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 5.344 s <<< FAILURE! - in > org.apache.hadoop.hdds.scm.block.TestBlockManager > 2020-02-06T21:44:53.5319796Z [ERROR] > testMultipleBlockAllocation(org.apache.hadoop.hdds.scm.block.TestBlockManager) > Time elapsed: 1.167 s <<< ERROR! > 2020-02-06T21:44:53.5319942Z java.util.concurrent.TimeoutException: > 2020-02-06T21:44:53.5320496Z Timed out waiting for condition. Thread > diagnostics: > 2020-02-06T21:44:53.5320839Z Timestamp: 2020-02-06 09:44:52,261 > 2020-02-06T21:44:53.5320901Z > 2020-02-06T21:44:53.5321178Z "Thread-26" prio=5 tid=46 runnable > 2020-02-06T21:44:53.5321292Z java.lang.Thread.State: RUNNABLE > 2020-02-06T21:44:53.5321391Z at java.lang.Thread.dumpThreads(Native > Method) > 2020-02-06T21:44:53.5326891Z at > java.lang.Thread.getAllStackTraces(Thread.java:1610) > 2020-02-06T21:44:53.5327144Z at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) > 2020-02-06T21:44:53.5327309Z at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) > 2020-02-06T21:44:53.5327465Z at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389) > 2020-02-06T21:44:53.5327618Z at > org.apache.hadoop.hdds.scm.block.TestBlockManager.testMultipleBlockAllocation(TestBlockManager.java:280) > 2020-02-06T21:44:53.5388042Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-02-06T21:44:53.5388702Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-02-06T21:44:53.5388905Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-02-06T21:44:53.5389045Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-02-06T21:44:53.5389195Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > 2020-02-06T21:44:53.5389331Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-02-06T21:44:53.5389662Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > 2020-02-06T21:44:53.5389776Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-02-06T21:44:53.5389916Z at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > 2020-02-06T21:44:53.5390040Z "Signal Dispatcher" daemon prio=9 tid=4 runnable > 2020-02-06T21:44:53.5390156Z java.lang.Thread.State: RUNNABLE > 2020-02-06T21:44:53.5390783Z > "EventQueue-CloseContainerForCloseContainerEventHandler" prio=5 tid=32 in > Object.wait() > 2020-02-06T21:44:53.5390916Z java.lang.Thread.State: WAITING (on object > monitor) > 2020-02-06T21:44:53.5391019Z at sun.misc.Unsafe.park(Native Method) > 2020-02-06T21:44:53.5391149Z at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > 2020-02-06T21:44:53.5391299Z at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > 2020-02-06T21:44:53.5391448Z at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > 2020-02-06T21:44:53.5391587Z at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) > 2020-02-06T21:44:53.5391721Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > 2020-02-06T21:44:53.5391844Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-02-06T21:44:53.5391971Z at java.lang.Thread.run(Thread.java:748) > 2020-02-06T21:44:53.5392100Z "IPC Server idle connection scanner for port > 43801" daemon prio=5 tid=24 in Object.wait() > 2020-02-06T21:44:53.5392227Z java.lang.Thread.State: WAITING (on object > monitor) > 2020-02-06T21:44:53.5392347Z at java.lang.Object.wait(Native Method) > 2020-02-06T21:44:53.5392463Z at java.lang.Object.wait(Object.java:502) >
[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager
adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager URL: https://github.com/apache/hadoop-ozone/pull/659 ## What changes were proposed in this pull request? `TestBlockManager` intermittently times out waiting for exit from safe mode. This happens due to race condition between two safe mode status events in different handler threads (but the same handler object): one from SCM, another from the test code. Temporary debug log (in "passing" order): ``` (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling safe mode status event in thread 26: true (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling safe mode status event in thread 28: false ``` If the order is reversed, SCM may stay in safe mode as far as `BlockManagerImpl` sees it. Worse, it may return to safe mode while `BlockManagerImpl` is trying to perform some operation, eg.: ``` SCMException: SafeModePrecheck failed for allocateBlock ... at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:160) at org.apache.hadoop.hdds.scm.block.TestBlockManager.testAllocateBlock(TestBlockManager.java:150) ``` The proposed fix is to disable safe mode status emission (ie. ignore the event from SCM) and let the test set safe mode explicitly in `BlockManagerImpl`. This should be fine since this is a unit test, not integration one. https://issues.apache.org/jira/browse/HDDS-2989 ## How was this patch tested? Ran TestBlockManager 10x: https://github.com/adoroszlai/hadoop-ozone/runs/497791137 then 50x: https://github.com/adoroszlai/hadoop-ozone/runs/497839450 and regular full CI: https://github.com/adoroszlai/hadoop-ozone/runs/498781616 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3153) Create REST API to serve Recon Dashboard and integrate with UI in Recon.
Vivek Ratnavel Subramanian created HDDS-3153: Summary: Create REST API to serve Recon Dashboard and integrate with UI in Recon. Key: HDDS-3153 URL: https://issues.apache.org/jira/browse/HDDS-3153 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: Ozone Recon Affects Versions: 0.5.0 Reporter: Vivek Ratnavel Subramanian Assignee: Vivek Ratnavel Subramanian Attachments: Screen Shot 2020-03-10 at 12.10.41 PM.png Add a REST API to serve information required for recon dashboard !Screen Shot 2020-03-10 at 12.10.41 PM.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #655: HDDS-3100. Fix TestDeadNodeHandler.
adoroszlai commented on a change in pull request #655: HDDS-3100. Fix TestDeadNodeHandler. URL: https://github.com/apache/hadoop-ozone/pull/655#discussion_r390547825 ## File path: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/TestDeadNodeHandler.java ## @@ -89,16 +89,16 @@ public void setup() throws IOException, AuthenticationException { storageDir = GenericTestUtils.getTempPath( TestDeadNodeHandler.class.getSimpleName() + UUID.randomUUID()); conf.set(HddsConfigKeys.OZONE_METADATA_DIRS, storageDir); -conf.setInt(OZONE_DATANODE_PIPELINE_LIMIT, 0); eventQueue = new EventQueue(); +System.out.println("eventQueue = " + eventQueue.toString()); Review comment: Leftover debug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #648: HDDS-3117. Recon throws InterruptedException while getting new snapshot from OM.
avijayanhwx commented on a change in pull request #648: HDDS-3117. Recon throws InterruptedException while getting new snapshot from OM. URL: https://github.com/apache/hadoop-ozone/pull/648#discussion_r390532746 ## File path: hadoop-ozone/dist/src/main/smoketest/recon/recon-api.robot ## @@ -25,16 +25,26 @@ ${ENDPOINT_URL} http://recon:9888 ${API_ENDPOINT_URL} http://recon:9888/api/v1 *** Test Cases *** -Recon REST API +Recon OM APIs +Run Keyword if '${SECURITY_ENABLED}' == 'true' Kinit test user testuser testuser.keytab +Execute ozone freon rk --numOfVolumes 1 --numOfBuckets 1 --numOfKeys 10 --keySize 1025 +Sleep 90s Review comment: Thanks for the suggestions. I have fixed this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
[ https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056202#comment-17056202 ] Siddharth Wagle commented on HDDS-3133: --- [~msingh] The org.apache.hadoop.hdfs.protocol.HdfsFileStatus#getFileId, is defined in HdfsFileStatus, I looked at where this is called: https://github.com/apache/hive/blob/1e15791987098a177625b16b468e96021fb6dd29/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L767 Ozone cannot implement HdfsFileStatus, so will this need Hive change also? cc:[~avijayan] since he worked on this area in Ozone. > Add export objectIds in Ozone as FileIds to allow LLAP to cache the files > - > > Key: HDDS-3133 > URL: https://issues.apache.org/jira/browse/HDDS-3133 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem, Ozone Manager >Reporter: Mukul Kumar Singh >Assignee: Siddharth Wagle >Priority: Major > > Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds > need to be exported as fileIds to allow the caching to happen effectively. > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
adoroszlai commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler. URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597208683 > Unit test failure (TestBlockManager) is unrelated. Being fixed in HDDS-2989. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler. URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597207559 Unit test failure (TestBlockManager) is unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.
bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390463072 ## File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyGenerator.java ## @@ -86,22 +91,24 @@ public Void call() throws Exception { OzoneConfiguration ozoneConfiguration = createOzoneConfiguration(); -ensureVolumeAndBucketExist(ozoneConfiguration, volumeName, bucketName); contentGenerator = new ContentGenerator(keySize, bufferSize); metadata = new HashMap<>(); -try (OzoneClient rpcClient = OzoneClientFactory -.getRpcClient(ozoneConfiguration)) { - - bucket = - rpcClient.getObjectStore().getVolume(volumeName) - .getBucket(bucketName); +OzoneClient rpcClient = null; +try { + rpcClient = createOzoneClient(omServiceID, ozoneConfiguration); Review comment: Initially I was using if,else to create OzoneClient, later moved the code to a method. Used try with resource. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.
bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390462290 ## File path: hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/BaseFreonGenerator.java ## @@ -469,4 +456,13 @@ public AtomicLong getAttemptCounter() { public int getThreadNo() { return threadNo; } + + protected OzoneClient createOzoneClient(String omServiceID, + OzoneConfiguration conf) throws Exception { +if (omServiceID != null) { + return OzoneClientFactory.getRpcClient(omServiceID, conf); +} else { + return OzoneClientFactory.getRpcClient(conf); Review comment: Removed getClient, and used getRpcClient, to avoid these kind of mistakes instead of duplicating the same code in multiple functions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA.
bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/649#issuecomment-597194276 Thank You @adoroszlai for the review. Addressed review comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server
[ https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056097#comment-17056097 ] Nilotpal Nandi commented on HDDS-3088: -- I think due to larger retry value , the cli client operations are also hung ( as it waits for 2147483647 iterations) when it tries to connect to dead SCM and it affects the test executions > maxRetries value is too large while trying to reconnect to SCM server > - > > Key: HDDS-3088 > URL: https://issues.apache.org/jira/browse/HDDS-3088 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Assignee: Nanda kumar >Priority: Blocker > > MaxRetries value is 2147483647 which is too high > It keeps on retrying to connect to SCM server. > > {noformat} > 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10535 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10536 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10537 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10538 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server
[ https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056089#comment-17056089 ] Arpit Agarwal commented on HDDS-3088: - Then we probably want it to try forever, just like HDFS. Do you see any harm in doing that? I think what is missing is some kind of delay between the retries, currently it seems to be retrying in a tight loop. > maxRetries value is too large while trying to reconnect to SCM server > - > > Key: HDDS-3088 > URL: https://issues.apache.org/jira/browse/HDDS-3088 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Assignee: Nanda kumar >Priority: Blocker > > MaxRetries value is 2147483647 which is too high > It keeps on retrying to connect to SCM server. > > {noformat} > 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10535 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10536 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10537 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10538 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server
[ https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056071#comment-17056071 ] Nilotpal Nandi commented on HDDS-3088: -- [~arp], Here datanode is trying to connect to SCM. > maxRetries value is too large while trying to reconnect to SCM server > - > > Key: HDDS-3088 > URL: https://issues.apache.org/jira/browse/HDDS-3088 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nilotpal Nandi >Assignee: Nanda kumar >Priority: Blocker > > MaxRetries value is 2147483647 which is too high > It keeps on retrying to connect to SCM server. > > {noformat} > 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10535 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10536 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10537 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS) > 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. > Already tried 10538 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 > MILLISECONDS){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3118) Possible deadlock in LockManager
[ https://issues.apache.org/jira/browse/HDDS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-3118: - Component/s: Ozone Manager > Possible deadlock in LockManager > > > Key: HDDS-3118 > URL: https://issues.apache.org/jira/browse/HDDS-3118 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Attila Doroszlai >Assignee: Bharat Viswanadham >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: repro.log, repro.patch > > Time Spent: 40m > Remaining Estimate: 0h > > {{LockManager}} has a possible deadlock. > # Number of locks is limited by using a {{GenericObjectPool}}. If N locks > are already acquired, new requestors need to wait. This wait in > {{getLockForLocking}} happens in a callback executed from > {{ConcurrentHashMap#compute}} while holding a lock on a map entry. > # While releasing a lock, {{decrementActiveLockCount}} implicitly requires a > lock on an entry in {{ConcurrentHashMap}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on a change in pull request #525: HDDS-2798. beyond/Containers.md translation
elek commented on a change in pull request #525: HDDS-2798. beyond/Containers.md translation URL: https://github.com/apache/hadoop-ozone/pull/525#discussion_r390390241 ## File path: hadoop-hdds/docs/content/beyond/Containers.zh.md ## @@ -0,0 +1,212 @@ +--- +title: "Ozone 中的容器技术" +summary: Ozone 广泛地使用容器来进行测试,本页介绍 Ozone 中容器的使用及其最佳实践。 +weight: 2 +--- + + +Ozone 的开发中大量地使用了 Docker,包括以下三种主要的应用场景: + +* __开发__: + * 我们使用 docker 来启动本地伪集群(docker 可以提供统一的环境,但是不需要创建镜像)。 +* __测试__: + * 我们从开发分支创建 docker 镜像,然后在 kubernetes 或其它容器编排系统上测试 ozone。 + * 我们为每个发行版提供了 _apache/ozone_ 镜像,以方便用户体验 Ozone。 + 这些镜像 __不__ 应当在 __生产__ 中使用。 + + +当在生产中使用容器方式部署 ozone 时,我们强烈建议你创建自己的镜像。请把所有自带的容器镜像和 k8s 资源文件当作示例指南,参考它们进行定制。 + + +* __生产__: + * 我们提供了如何创建用于生产的 docker 镜像的文档。 + +下面我们来详细地介绍一下各种应用场景: + +## 开发 + +Ozone 安装包中包含了 docker-compose 的示例目录,用于方便地在本地机器启动 Ozone 集群。 + +使用官方提供的发行包: + +```bash +cd compose/ozone +docker-compose up -d +``` + +本地构建方式: + +```bash +cd hadoop-ozone/dist/target/ozone-*/compose +docker-compose up -d +``` + +这些 compose 环境文件是重要的工具,可以用来随时启动各种类型的 Ozone 集群。 + +为了确保 compose 文件是最新的,我们提供了验收测试套件,套件会启动集群并检查其基本行为是否正常。 + +验收测试也包含在发行包中,你可以在 `smoketest` 目录下找到各个测试的定义。 + +你可以在任意 compose 目录进行测试,比如: + +```bash +cd compose/ozone +./test.sh +``` + +### 实现细节 + +`compose` 测试都基于 apache/hadoop-runner 镜像,这个镜像本身并不包含任何 Ozone 的 jar 包或二进制文件,它只是提供其了启动 ozone 的辅助脚本。 + +hadoop-runner 提供了一个随处运行 Ozone 的固定环境,Ozone 分发包通过目录挂载包含在其中。 + +(docker-compose 示例片段) + +``` + scm: + image: apache/hadoop-runner:jdk11 + volumes: + - ../..:/opt/hadoop + ports: + - 9876:9876 + +``` + +容器应该通过环境变量来进行配置,由于每个容器都应当设置相同的环境变量,我们在单独的文件中维护了一个环境变量列表: + +``` + scm: + image: apache/hadoop-runner:jdk11 + #... + env_file: + - ./docker-config +``` + +docker-config 文件中包含了所需环境变量的列表: + +``` +OZONE-SITE.XML_ozone.om.address=om +OZONE-SITE.XML_ozone.om.http-address=om:9874 +OZONE-SITE.XML_ozone.scm.names=scm +#... +``` + +你可以看到我们所使用的命名规范,根据这些环境变量的名字,`hadoop-runner` 基础镜像中的[脚本](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts) 会生成合适的 hadoop XML 配置文件(在我们这种情况下就是 `ozone-site.xml`)。 + +`hadoop-runner` 镜像的[入口点](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter +.sh)包含了一个辅助脚本,这个辅助脚本可以根据环境变量触发上述的配置文件生成以及其它动作(比如初始化 SCM 和 OM 的存储、下载必要的 keytab 等)。 + +## 测试 + +`docker-compose` 的方式应当只用于本地测试,不适用于多节点集群。要在多节点集群上使用容器,我们需要像 Kubernetes 这样的容器编排系统。 + +Kubernetes 示例文件在 `kubernetes` 文件夹中。 + +*请注意*:所有提供的镜像都使用 `hadoop-runner` 作为基础镜像,这个镜像中包含了所有测试环境所需的测试工具。对于生产环境,我们推荐用户使用自己的基础镜像创建可靠的镜像。 + +### 发行包测试 + +可以通过部署任意的示例集群来测试发行包: + +```bash +cd kubernetes/examples/ozone +kubectl apply -f +``` + +注意,此时会从 Docker Hub 下载最新的镜像。 + +### 开发构建测试 + +为了测试开发中的构建,你需要创建自己的镜像并上传到自己的 docker 仓库中: + + +```bash +mvn clean install -DskipTests -Pdocker-build,docker-push -Ddocker.image=myregistry:9000/name/ozone +``` + +所有生成的 kubernetes 资源文件都会使用这个镜像 (`image:` keys are adjusted during the build) + +```bash +cd kubernetes/examples/ozone +kubectl apply -f +``` + +## 生产 + + +我们强烈推荐在生产集群使用自己的镜像,并根据实际的需求调整基础镜像、文件掩码、安全设置和用户设置。 + + +你可以使用我们开发中所用的镜像作为示例: + + * [基础镜像] (https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile) + * [完整镜像] (https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/docker/Dockerfile) + + Dockerfile 中大部分内容都是可选的辅助功能,但如果要使用我们提供的 kubernetes 示例资源文件,你可能需要[这里](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts)的脚本。 + + * 两个 python 脚本将环境变量转化为实际的 hadoop XML 配置文件 + * start.sh 根据环境变量执行 python 脚本(以及其它初始化工作) + +## 容器 + +Ozone 相关的容器镜像和 Dockerfile 位置: + + + + + + # + 容器 + 仓库 + 基础镜像 + 分支 + 标签 + 说明 + + + + + 1 + apache/ozone + https://github.com/apache/hadoop-docker-ozone + ozone-... + hadoop-runner + 0.3.0,0.4.0,0.4.1 + 每个 Ozone 发行版都对应一个新标签。 + + + 2 + apache/hadoop-runner + https://github.com/apache/hadoop + docker-hadoop-runner + centos + jdk11,jdk8,latest + 这是用于测试 Hadoop Ozone 的基础镜像,包含了一系列可以让我们更加方便地运行 Ozone 的工具。 + + + Review comment: Sure, feel free to remove it. I will create a patch as the previous lines are also outdated... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests
[ https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3152: --- Status: Patch Available (was: In Progress) > Reduce number of chunkwriter threads in integration tests > - > > Key: HDDS-3152 > URL: https://issues.apache.org/jira/browse/HDDS-3152 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Integration tests run multiple datanodes in the same JVM. Each datanode > comes with 60 chunk writer threads by default (may be decreased in > HDDS-3053). This makes thread dumps (eg. produced by > {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there > may be 300+ such threads. > Since integration tests are generally run with a single disk which is even > shared among the datanodes, a few threads per datanode should be enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
[ https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3095: --- Status: Patch Available (was: In Progress) > Intermittent failure in > TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit > --- > > Key: HDDS-3095 > URL: https://issues.apache.org/jira/browse/HDDS-3095 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597} > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 284.887 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient > [ERROR] > testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient) > Time elapsed: 66.589 s <<< FAILURE! > java.lang.AssertionError > ... >at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
[ https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3095: - Labels: pull-request-available (was: ) > Intermittent failure in > TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit > --- > > Key: HDDS-3095 > URL: https://issues.apache.org/jira/browse/HDDS-3095 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597} > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 284.887 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient > [ERROR] > testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient) > Time elapsed: 66.589 s <<< FAILURE! > java.lang.AssertionError > ... >at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #658: HDDS-3095. Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
adoroszlai opened a new pull request #658: HDDS-3095. Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit URL: https://github.com/apache/hadoop-ozone/pull/658 ## What changes were proposed in this pull request? Intermittent failure in `TestFailureHandlingByClient` happens when the datanode just stopped is not excluded during subsequent write operation. This PR proposes to make `MiniOzoneCluster` wait for datanode to stop, as it already does during "restart datanode". https://issues.apache.org/jira/browse/HDDS-3095 ## How was this patch tested? Ran `TestFailureHandlingByClient` 20x successfully: https://github.com/adoroszlai/hadoop-ozone/runs/497741382 and regular full CI: https://github.com/adoroszlai/hadoop-ozone/runs/497755796 where only failure is (supposedly) unrelated in Test2WayCommitInRatis. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests
adoroszlai opened a new pull request #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests URL: https://github.com/apache/hadoop-ozone/pull/657 ## What changes were proposed in this pull request? Integration tests run multiple datanodes in the same JVM. Each datanode comes with 60 chunk writer threads by default (may be decreased in [HDDS-3053](https://issues.apache.org/jira/browse/HDDS-3053)). This makes thread dumps (eg. produced by `GenericTestUtils.waitFor` on timeout) really hard to navigate, as there may be 300+ such threads. Since integration tests are generally run with a single disk which is even shared among the datanodes, a few threads per datanode should be enough. https://issues.apache.org/jira/browse/HDDS-3152 ## How was this patch tested? Regular CI: https://github.com/adoroszlai/hadoop-ozone/runs/497866229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests
[ https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3152: - Labels: pull-request-available (was: ) > Reduce number of chunkwriter threads in integration tests > - > > Key: HDDS-3152 > URL: https://issues.apache.org/jira/browse/HDDS-3152 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > Integration tests run multiple datanodes in the same JVM. Each datanode > comes with 60 chunk writer threads by default (may be decreased in > HDDS-3053). This makes thread dumps (eg. produced by > {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there > may be 300+ such threads. > Since integration tests are generally run with a single disk which is even > shared among the datanodes, a few threads per datanode should be enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek edited a comment on issue #389: HDDS-2534. scmcli container delete not working
elek edited a comment on issue #389: HDDS-2534. scmcli container delete not working URL: https://github.com/apache/hadoop-ozone/pull/389#issuecomment-597084180 /pending Questions/suggestions from @xiaoyuyao are not yet addressed in the last commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #389: HDDS-2534. scmcli container delete not working
elek commented on issue #389: HDDS-2534. scmcli container delete not working URL: https://github.com/apache/hadoop-ozone/pull/389#issuecomment-597084180 /pending Questions/suggestions from @xiaoyuyao are not yet addressed in #432 12f3f8ac94cf8808757bb7673e4208d8b0fede09 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2989) Intermittent timeout in TestBlockManager
[ https://issues.apache.org/jira/browse/HDDS-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai reassigned HDDS-2989: -- Assignee: Attila Doroszlai > Intermittent timeout in TestBlockManager > > > Key: HDDS-2989 > URL: https://issues.apache.org/jira/browse/HDDS-2989 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > > {code:title=https://github.com/apache/hadoop-ozone/runs/430663688} > 2020-02-06T21:44:53.5319531Z [ERROR] Tests run: 9, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 5.344 s <<< FAILURE! - in > org.apache.hadoop.hdds.scm.block.TestBlockManager > 2020-02-06T21:44:53.5319796Z [ERROR] > testMultipleBlockAllocation(org.apache.hadoop.hdds.scm.block.TestBlockManager) > Time elapsed: 1.167 s <<< ERROR! > 2020-02-06T21:44:53.5319942Z java.util.concurrent.TimeoutException: > 2020-02-06T21:44:53.5320496Z Timed out waiting for condition. Thread > diagnostics: > 2020-02-06T21:44:53.5320839Z Timestamp: 2020-02-06 09:44:52,261 > 2020-02-06T21:44:53.5320901Z > 2020-02-06T21:44:53.5321178Z "Thread-26" prio=5 tid=46 runnable > 2020-02-06T21:44:53.5321292Z java.lang.Thread.State: RUNNABLE > 2020-02-06T21:44:53.5321391Z at java.lang.Thread.dumpThreads(Native > Method) > 2020-02-06T21:44:53.5326891Z at > java.lang.Thread.getAllStackTraces(Thread.java:1610) > 2020-02-06T21:44:53.5327144Z at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) > 2020-02-06T21:44:53.5327309Z at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) > 2020-02-06T21:44:53.5327465Z at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389) > 2020-02-06T21:44:53.5327618Z at > org.apache.hadoop.hdds.scm.block.TestBlockManager.testMultipleBlockAllocation(TestBlockManager.java:280) > 2020-02-06T21:44:53.5388042Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-02-06T21:44:53.5388702Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-02-06T21:44:53.5388905Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-02-06T21:44:53.5389045Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-02-06T21:44:53.5389195Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > 2020-02-06T21:44:53.5389331Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-02-06T21:44:53.5389662Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > 2020-02-06T21:44:53.5389776Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-02-06T21:44:53.5389916Z at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > 2020-02-06T21:44:53.5390040Z "Signal Dispatcher" daemon prio=9 tid=4 runnable > 2020-02-06T21:44:53.5390156Z java.lang.Thread.State: RUNNABLE > 2020-02-06T21:44:53.5390783Z > "EventQueue-CloseContainerForCloseContainerEventHandler" prio=5 tid=32 in > Object.wait() > 2020-02-06T21:44:53.5390916Z java.lang.Thread.State: WAITING (on object > monitor) > 2020-02-06T21:44:53.5391019Z at sun.misc.Unsafe.park(Native Method) > 2020-02-06T21:44:53.5391149Z at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > 2020-02-06T21:44:53.5391299Z at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > 2020-02-06T21:44:53.5391448Z at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > 2020-02-06T21:44:53.5391587Z at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) > 2020-02-06T21:44:53.5391721Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > 2020-02-06T21:44:53.5391844Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-02-06T21:44:53.5391971Z at java.lang.Thread.run(Thread.java:748) > 2020-02-06T21:44:53.5392100Z "IPC Server idle connection scanner for port > 43801" daemon prio=5 tid=24 in Object.wait() > 2020-02-06T21:44:53.5392227Z java.lang.Thread.State: WAITING (on object > monitor) > 2020-02-06T21:44:53.5392347Z at java.lang.Object.wait(Native Method) > 2020-02-06T21:44:53.5392463Z at java.lang.Object.wait(Object.java:502) > 2020-02-06T21:44:53.5392567Z at >
[jira] [Assigned] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
[ https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai reassigned HDDS-3095: -- Assignee: Attila Doroszlai > Intermittent failure in > TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit > --- > > Key: HDDS-3095 > URL: https://issues.apache.org/jira/browse/HDDS-3095 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > > {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597} > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 284.887 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient > [ERROR] > testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient) > Time elapsed: 66.589 s <<< FAILURE! > java.lang.AssertionError > ... >at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #578: HDDS-3053. Decrease the number of the chunk writer threads
elek commented on issue #578: HDDS-3053. Decrease the number of the chunk writer threads URL: https://github.com/apache/hadoop-ozone/pull/578#issuecomment-597072663 > did u check the pending request queue in the leader? No I didn't. Why is it interesting? > how many mappers were used for the test? 92 (see the link for this and all the other parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3152) Reduce number of chunkwriter threads in integration tests
Attila Doroszlai created HDDS-3152: -- Summary: Reduce number of chunkwriter threads in integration tests Key: HDDS-3152 URL: https://issues.apache.org/jira/browse/HDDS-3152 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: test Reporter: Attila Doroszlai Assignee: Attila Doroszlai Integration tests run multiple datanodes in the same JVM. Each datanode comes with 60 chunk writer threads by default (may be decreased in HDDS-3053). This makes thread dumps (eg. produced by {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there may be 300+ such threads. Since integration tests are generally run with a single disk which is even shared among the datanodes, a few threads per datanode should be enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2610) Fix the ObjectStore#listVolumes failure when argument is null
[ https://issues.apache.org/jira/browse/HDDS-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai resolved HDDS-2610. Fix Version/s: 0.6.0 Resolution: Done > Fix the ObjectStore#listVolumes failure when argument is null > - > > Key: HDDS-2610 > URL: https://issues.apache.org/jira/browse/HDDS-2610 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: YiSheng Lien >Assignee: YiSheng Lien >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > As the description of the > [VolumeManager#listVolumes|https://github.com/apache/hadoop-ozone/blob/a731eeaa9ed0d1faecda3665b599145316300101/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/VolumeManager.java#L84-L101], > we would list all volumes when setting the userName null. > But now it throws OMException by underlying method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2717) Handle chunk increments in datanode
[ https://issues.apache.org/jira/browse/HDDS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2717: --- Status: Patch Available (was: In Progress) > Handle chunk increments in datanode > --- > > Key: HDDS-2717 > URL: https://issues.apache.org/jira/browse/HDDS-2717 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Let datanode handle incremental additions to chunks (data with non-zero > offset). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2610) Fix the ObjectStore#listVolumes failure when argument is null
[ https://issues.apache.org/jira/browse/HDDS-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2610: --- Labels: (was: pull-request-available) > Fix the ObjectStore#listVolumes failure when argument is null > - > > Key: HDDS-2610 > URL: https://issues.apache.org/jira/browse/HDDS-2610 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: YiSheng Lien >Assignee: YiSheng Lien >Priority: Major > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > As the description of the > [VolumeManager#listVolumes|https://github.com/apache/hadoop-ozone/blob/a731eeaa9ed0d1faecda3665b599145316300101/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/VolumeManager.java#L84-L101], > we would list all volumes when setting the userName null. > But now it throws OMException by underlying method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3143) Rename silently ignored tests
[ https://issues.apache.org/jira/browse/HDDS-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3143: --- Fix Version/s: 0.6.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Rename silently ignored tests > - > > Key: HDDS-3143 > URL: https://issues.apache.org/jira/browse/HDDS-3143 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Surefire plugin is configured to run {{Test*}} classes, but there are two > test classes named {{*Test}}: > {code} > $ find */*/src/test/java -name '*Test.java' | xargs grep -l '@Test' > hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/HddsServerUtilTest.java > hadoop-ozone/insight/src/test/java/org/apache/hadoop/ozone/insight/LogSubcommandTest.java > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #607: HDDS-3002. NFS mountd support for Ozone
elek commented on issue #607: HDDS-3002. NFS mountd support for Ozone URL: https://github.com/apache/hadoop-ozone/pull/607#issuecomment-597047992 /pending "I will post the design doc..." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #399: HDDS-2424. Add the recover-trash command server side handling.
elek commented on issue #399: HDDS-2424. Add the recover-trash command server side handling. URL: https://github.com/apache/hadoop-ozone/pull/399#issuecomment-597047650 /pending Comments from @bharatviswa504 are not addressed, yet... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #618: HDDS-2911. Fix lastUsed and stateEnterTime value in container info is not human friendly
elek commented on issue #618: HDDS-2911. Fix lastUsed and stateEnterTime value in container info is not human friendly URL: https://github.com/apache/hadoop-ozone/pull/618#issuecomment-597047183 > In this case, can we display it as string in CLI ouput, and keep the long value internally. +1 It seems to be to more flexible option. Keep the long value for protobuf (we can keep even the nanosec) but print out in a human readable form... We can also change the Java type (and not the protobuf type) to a proper java8 Time object (like `Instant`). That is more meaningful and might be printed out properly by the current default JSON serializer... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3142) Create isolated enviornment for OM to test it without SCM
[ https://issues.apache.org/jira/browse/HDDS-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3142: - Labels: pull-request-available (was: ) > Create isolated enviornment for OM to test it without SCM > - > > Key: HDDS-3142 > URL: https://issues.apache.org/jira/browse/HDDS-3142 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > OmKeyGenerator class from Freon can generate keys (open key + commit key). > But this test tests both OM and SCM performance. It seems to be useful to > have a method to test only the OM performance with faking the response from > SCM. > Can be done easily with the same approach what we have in HDDS-3023: A simple > utility class can be implemented and with byteman we can replace the client > calls with the fake method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek opened a new pull request #656: HDDS-3142. Create isolated enviornment for OM to test it without SCM.
elek opened a new pull request #656: HDDS-3142. Create isolated enviornment for OM to test it without SCM. URL: https://github.com/apache/hadoop-ozone/pull/656 ## What changes were proposed in this pull request? `OmKeyGenerator` class from Freon can generate keys (open key + commit key). But this test tests both OM and SCM performance. It seems to be useful to have a method to test only the OM performance with faking the response from SCM. Can be done easily with the same approach what we have in HDDS-3023: A simple utility class can be implemented and with byteman we can replace the client calls with the fake method. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3142 ## How was this patch tested? 1. Download [byteman](https://byteman.jboss.org/) 2. Start a pure OM (`ozone om --init` + `ozone om`) with the following JVM parameters: (change the path) ``` -javaagent:/home/elek/prog/byteman/lib/byteman.jar=script:/home/elek/projects/ozone/dev-support/byteman/mock-scm.btm,boot:/home/elek/prog/byteman/lib/byteman.jar -Dorg.jboss.byteman.transform.all ``` 3. Start a simple freon test: `ozone freon omkg` Expected result: It should be possible to init and start OM without SCM and test it with the key generator) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai merged pull request #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null
adoroszlai merged pull request #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null URL: https://github.com/apache/hadoop-ozone/pull/261 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null
adoroszlai commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null URL: https://github.com/apache/hadoop-ozone/pull/261#issuecomment-597012687 Filed [HDDS-3151](https://issues.apache.org/jira/browse/HDDS-3151) for the integration test failure, which we've seen earlier without this change, too. Acceptance test failure that SCM does not come out of safe mode is also observed elsewhere. Given we have a clean run on the PR source branch, and it's only 2 commits behind master, I think it's safe to merge. Thanks @cxorm for the contribution, @bharatviswa504 and @arp7 for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3151) Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3
[ https://issues.apache.org/jira/browse/HDDS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3151: --- Attachment: org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.txt org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient-output.txt > Intermittent timeout in > TestCloseContainerHandlingByClient#testMultiBlockWrites3 > > > Key: HDDS-3151 > URL: https://issues.apache.org/jira/browse/HDDS-3151 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Priority: Major > Attachments: > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient-output.txt, > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.txt > > > {code:title=https://github.com/apache/hadoop-ozone/runs/495906854} > Tests run: 8, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 210.963 s <<< > FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient > testMultiBlockWrites3(org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient) > Time elapsed: 108.777 s <<< ERROR! > java.util.concurrent.TimeoutException: > ... > at > org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:251) > at > org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:151) > at > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.waitForContainerClose(TestCloseContainerHandlingByClient.java:342) > at > org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testMultiBlockWrites3(TestCloseContainerHandlingByClient.java:310) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3151) Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3
Attila Doroszlai created HDDS-3151: -- Summary: Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3 Key: HDDS-3151 URL: https://issues.apache.org/jira/browse/HDDS-3151 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Attila Doroszlai {code:title=https://github.com/apache/hadoop-ozone/runs/495906854} Tests run: 8, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 210.963 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient testMultiBlockWrites3(org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient) Time elapsed: 108.777 s <<< ERROR! java.util.concurrent.TimeoutException: ... at org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:251) at org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:151) at org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.waitForContainerClose(TestCloseContainerHandlingByClient.java:342) at org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testMultiBlockWrites3(TestCloseContainerHandlingByClient.java:310) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] cxorm commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null
cxorm commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null URL: https://github.com/apache/hadoop-ozone/pull/261#issuecomment-596985532 Thank you @arp7 for looking this PR. The [error check](https://github.com/apache/hadoop-ozone/pull/261/checks?check_run_id=495906854) is not related to the patch. Here is the [same branch](https://github.com/cxorm/hadoop-ozone/runs/495875746) passed all checks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3146) Intermittent timeout in TestOzoneRpcClient
[ https://issues.apache.org/jira/browse/HDDS-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055677#comment-17055677 ] Attila Doroszlai commented on HDDS-3146: https://github.com/apache/hadoop-ozone/runs/496450696 > Intermittent timeout in TestOzoneRpcClient > -- > > Key: HDDS-3146 > URL: https://issues.apache.org/jira/browse/HDDS-3146 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Priority: Major > Attachments: > org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient-output.txt > > > {code:title=https://github.com/apache/hadoop-ozone/runs/495197228} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) > on project hadoop-ozone-integration-test: There was a timeout or other error > in the fork > ... > org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org