[jira] [Updated] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3104:
-
Labels: pull-request-available  (was: )

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-3104.patch, 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] swagle opened a new pull request #660: HDDS-3104. Integration test crashes due to critical error in datanode.

2020-03-10 Thread GitBox
swagle opened a new pull request #660: HDDS-3104. Integration test crashes due 
to critical error in datanode.
URL: https://github.com/apache/hadoop-ozone/pull/660
 
 
   ## What changes were proposed in this pull request?
   Created a flag to tell StateContext that shutDown was called and not to 
overreact!
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-3104
   
   ## How was this patch tested?
   Verified by running TestContainerStateMachineFailureOnRead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056665#comment-17056665
 ] 

Siddharth Wagle commented on HDDS-3104:
---

Creating a PR anyways to run tests.

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-3104.patch, 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056664#comment-17056664
 ] 

Siddharth Wagle edited comment on HDDS-3104 at 3/11/20, 5:42 AM:
-

[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets 
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because it read the 
stale state of the state
3. StateContext.excute will set shutDownOnError to true when it sees the new 
state even though there was no error

So, I have a proposed patch for this, attaching it here before making a PR, 
would like to know your thoughts.


was (Author: swagle):
[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets 
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because it read the 
stale state of the state
3. StateContext.excute will set shutDownOnError to true when it sees this state 
even though there was no error

So, I have a proposed patch for this, attaching it here before making a PR, 
would like to know your thoughts.

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-3104.patch, 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread Siddharth Wagle (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-3104:
--
Attachment: HDDS-3104.patch

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-3104.patch, 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056664#comment-17056664
 ] 

Siddharth Wagle edited comment on HDDS-3104 at 3/11/20, 5:40 AM:
-

[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets 
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because it read the 
stale state of the state
3. StateContext.excute will set shutDownOnError to true when it sees this state 
even though there was no error

So, I have a proposed patch for this, attaching it here before making a PR, 
would like to know your thoughts.


was (Author: swagle):
[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets 
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because the stale 
state it read
3. StateContext.excute will set shutDownOnError to true when it sees this state 
even though there was no error

So, I have a proposed patch for this, attaching it here before making a PR, 
would like to know your thoughts.

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: HDDS-3104.patch, 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056664#comment-17056664
 ] 

Siddharth Wagle commented on HDDS-3104:
---

[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets 
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because the stale 
state it read
3. StateContext.excute will set shutDownOnError to true when it sees this state 
even though there was no error

So, I have a proposed patch for this, attaching it here before making a PR, 
would like to know your thoughts.

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
> Attachments: 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Description: 
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log(Refer to the amlog for details), we found that the 
time of over 40 minutes is AM writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 

  was:
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log(), we found that the time of over 40 minutes is AM 
writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 


> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: amlog, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(Refer to the amlog for details), we found that the 
> time of over 40 minutes is AM writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the log 
> on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
> ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first 

[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Description: 
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log(Refer to the amlog for details), we found that the 
time of over 40 minutes is AM writing a task log into ozone.

    At present, after MR execution, the Task information is recorded into the 
log on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 

  was:
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log(Refer to the amlog for details), we found that the 
time of over 40 minutes is AM writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 


> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: amlog, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(Refer to the amlog for details), we found that the 
> time of over 40 minutes is AM writing a task log into ozone.
>     At present, after MR execution, the Task information is recorded into the 
> log on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS 
> or ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the 

[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Description: 
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log(), we found that the time of over 40 minutes is AM 
writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 

  was:
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log, we found that the time of over 40 minutes is AM 
writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 


> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: amlog, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(), we found that the time of over 40 minutes is AM 
> writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the log 
> on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
> ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first place, we need to ensure that this content can be read 
> by 

[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Attachment: (was: syslog)

> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: amlog, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(), we found that the time of over 40 minutes is AM 
> writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the log 
> on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
> ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first place, we need to ensure that this content can be read 
> by other datanodes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Attachment: amlog

> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: amlog, stdout
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log(), we found that the time of over 40 minutes is AM 
> writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the log 
> on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
> ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first place, we need to ensure that this content can be read 
> by other datanodes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Description: 
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log, we found that the time of over 40 minutes is AM 
writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 

     The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 

  was:
Background:

When we execute mapreduce in the ozone, we find that the task will be stuck for 
a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//代码占位符
20/03/05 14:43:03 INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:03 INFO 
mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job:  map 
92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job:  map 93% reduce 30%20/03/05 
14:43:08 INFO mapreduce.Job:  map 93% reduce 31%20/03/05 14:43:11 INFO 
mapreduce.Job:  map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job:  map 
95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job:  map 96% reduce 31%20/03/05 
14:43:20 INFO mapreduce.Job:  map 97% reduce 32%20/03/05 14:43:24 INFO 
mapreduce.Job:  map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job:  map 
99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job:  map 100% reduce 
33%20/03/05 14:43:33 INFO mapreduce.Job:  map 100% reduce 100%20/03/05 15:29:52 
INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully20/03/05 
15:29:52 INFO mapreduce.Job: Counters: 51 File System Counters FILE: Number of 
bytes read=84602 FILE: Number of bytes written=162626320 FILE: Number of read 
operations=0 FILE: Number of large read operations=0 FILE: Number of write 
operations=0 O3FS: Number of bytes read=237780 O3FS: Number of bytes 
written=134217728089 O3FS: Number of read operations=4008 O3FS: Number of large 
read operations=0 O3FS: Number of write operations=1002 Job Counters  Killed 
map tasks=1 Launched map tasks=1000 Launched reduce tasks=1 Data-local map 
tasks=979 Rack-local map tasks=21 Total time spent by all maps in occupied 
slots (ms)=149515400 Total time spent by all reduces in occupied slots 
(ms)=449288 Total time spent by all map tasks (ms)=7475770 Total time spent by 
all reduce tasks (ms)=112322 Total vcore-milliseconds taken by all map 
tasks=7475770 Total vcore-milliseconds taken by all reduce tasks=112322 Total 
megabyte-milliseconds taken by all map tasks=153103769600 Total 
megabyte-milliseconds taken by all reduce tasks=460070912
{code}


> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: stdout, syslog
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log, we found that the time of over 40 minutes is AM 
> writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the log 
> on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
> ozone one by one 
> 

[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Description: 
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log, we found that the time of over 40 minutes is AM 
writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 

  was:
Background:

    When we execute mapreduce in the ozone, we find that the task will be stuck 
for a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//Refer to the attachment: stdout
20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
successfully{code}
    By looking at AM's log, we found that the time of over 40 minutes is AM 
writing a task log into ozone.

At present, after MR execution, the Task information is recorded into the log 
on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
ozone one by one 
([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
 

     The problem occurs when the number of task maps is large. 

     Currently, each flush operation in ozone generates a new chunk file in 
real time on the disk. This approach is not very efficient at the moment. For 
this we can refer to the implementation of HDFS flush. Instead of writing to 
disk each time flush writes the contents of the buffer to the datanode's OS 
buffer. In the first place, we need to ensure that this content can be read by 
other datanodes.

 


> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: stdout, syslog
>
>
> Background:
>     When we execute mapreduce in the ozone, we find that the task will be 
> stuck for a long time after the completion of Map and Reduce. The log is as 
> follows:
> {code:java}
> //Refer to the attachment: stdout
> 20/03/05 14:43:30 INFO mapreduce.Job: map 100% reduce 33% 
> 20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% 
> 20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully{code}
>     By looking at AM's log, we found that the time of over 40 minutes is AM 
> writing a task log into ozone.
> At present, after MR execution, the Task information is recorded into the log 
> on HDFS or ozone by AM.  Moreover, the task information is flush to HDFS or 
> ozone one by one 
> ([details|https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L1640]).
>  The problem occurs when the number of task maps is large. 
>      Currently, each flush operation in ozone generates a new chunk file in 
> real time on the disk. This approach is not very efficient at the moment. For 
> this we can refer to the implementation of HDFS flush. Instead of writing to 
> disk each time flush writes the contents of the buffer to the datanode's OS 
> buffer. In the first place, we need to ensure that this content can be read 
> 

[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-3152:
-
Fix Version/s: 0.6.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Reduce number of chunkwriter threads in integration tests
> -
>
> Key: HDDS-3152
> URL: https://issues.apache.org/jira/browse/HDDS-3152
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Integration tests run multiple datanodes in the same JVM.  Each datanode 
> comes with 60 chunk writer threads by default (may be decreased in 
> HDDS-3053).  This makes thread dumps (eg. produced by 
> {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there 
> may be 300+ such threads.
> Since integration tests are generally run with a single disk which is even 
> shared among the datanodes, a few threads per datanode should be enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread GitBox
bharatviswa504 commented on issue #657: HDDS-3152. Reduce number of chunkwriter 
threads in integration tests
URL: https://github.com/apache/hadoop-ozone/pull/657#issuecomment-597430639
 
 
   Thank You @adoroszlai for the contribution.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sonarcloud[bot] removed a comment on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text

2020-03-10 Thread GitBox
sonarcloud[bot] removed a comment on issue #588: HDDS-2886. parse and dump 
datanode segment file to pritable text
URL: https://github.com/apache/hadoop-ozone/pull/588#issuecomment-596883379
 
 
   SonarCloud Quality Gate failed.
   
   [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG)
 [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG)
 [2 
Bugs](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG)
  
   [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY)
 [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY)
 (and [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT)
 [1 Security 
Hotspot](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT)
 to review)  
   [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL)
 [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL)
 [16 Code 
Smells](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL)
   
   [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list)
 [0.0% 
Coverage](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list)
  
   [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list)
 [0.0% 
Duplication](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sonarcloud[bot] commented on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text

2020-03-10 Thread GitBox
sonarcloud[bot] commented on issue #588: HDDS-2886. parse and dump datanode 
segment file to pritable text
URL: https://github.com/apache/hadoop-ozone/pull/588#issuecomment-597430623
 
 
   SonarCloud Quality Gate failed.
   
   [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG)
 [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG)
 [2 
Bugs](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=BUG)
  
   [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY)
 [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=VULNERABILITY)
 (and [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT)
 [1 Security 
Hotspot](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=SECURITY_HOTSPOT)
 to review)  
   [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL)
 [](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=hadoop-ozone=588=false=CODE_SMELL)
   
   [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list)
 [0.0% 
Coverage](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_coverage=list)
  
   [](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list)
 [0.0% 
Duplication](https://sonarcloud.io/component_measures?id=hadoop-ozone=588=new_duplicated_lines_density=list)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread GitBox
bharatviswa504 merged pull request #657: HDDS-3152. Reduce number of 
chunkwriter threads in integration tests
URL: https://github.com/apache/hadoop-ozone/pull/657
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3100) Fix TestDeadNodeHandler.

2020-03-10 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-3100:
-
Fix Version/s: 0.6.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix TestDeadNodeHandler.
> 
>
> Key: HDDS-3100
> URL: https://issues.apache.org/jira/browse/HDDS-3100
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.

2020-03-10 Thread GitBox
bharatviswa504 commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597429367
 
 
   Thank You @avijayanhwx for the contribution and @adoroszlai for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #655: HDDS-3100. Fix TestDeadNodeHandler.

2020-03-10 Thread GitBox
bharatviswa504 merged pull request #655: HDDS-3100. Fix TestDeadNodeHandler.
URL: https://github.com/apache/hadoop-ozone/pull/655
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Attachment: syslog

> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: stdout, syslog
>
>
> Background:
> When we execute mapreduce in the ozone, we find that the task will be stuck 
> for a long time after the completion of Map and Reduce. The log is as follows:
> {code:java}
> //代码占位符
> 20/03/05 14:43:03 INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:03 
> INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job:  
> map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job:  map 93% reduce 
> 30%20/03/05 14:43:08 INFO mapreduce.Job:  map 93% reduce 31%20/03/05 14:43:11 
> INFO mapreduce.Job:  map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job:  
> map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job:  map 96% reduce 
> 31%20/03/05 14:43:20 INFO mapreduce.Job:  map 97% reduce 32%20/03/05 14:43:24 
> INFO mapreduce.Job:  map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job:  
> map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job:  map 100% reduce 
> 33%20/03/05 14:43:33 INFO mapreduce.Job:  map 100% reduce 100%20/03/05 
> 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System 
> Counters FILE: Number of bytes read=84602 FILE: Number of bytes 
> written=162626320 FILE: Number of read operations=0 FILE: Number of large 
> read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes 
> read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read 
> operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write 
> operations=1002 Job Counters  Killed map tasks=1 Launched map tasks=1000 
> Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 
> Total time spent by all maps in occupied slots (ms)=149515400 Total time 
> spent by all reduces in occupied slots (ms)=449288 Total time spent by all 
> map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total 
> vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds 
> taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map 
> tasks=153103769600 Total megabyte-milliseconds taken by all reduce 
> tasks=460070912
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Attachment: stdout

> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
> Attachments: stdout
>
>
> Background:
> When we execute mapreduce in the ozone, we find that the task will be stuck 
> for a long time after the completion of Map and Reduce. The log is as follows:
> {code:java}
> //代码占位符
> 20/03/05 14:43:03 INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:03 
> INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job:  
> map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job:  map 93% reduce 
> 30%20/03/05 14:43:08 INFO mapreduce.Job:  map 93% reduce 31%20/03/05 14:43:11 
> INFO mapreduce.Job:  map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job:  
> map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job:  map 96% reduce 
> 31%20/03/05 14:43:20 INFO mapreduce.Job:  map 97% reduce 32%20/03/05 14:43:24 
> INFO mapreduce.Job:  map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job:  
> map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job:  map 100% reduce 
> 33%20/03/05 14:43:33 INFO mapreduce.Job:  map 100% reduce 100%20/03/05 
> 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System 
> Counters FILE: Number of bytes read=84602 FILE: Number of bytes 
> written=162626320 FILE: Number of read operations=0 FILE: Number of large 
> read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes 
> read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read 
> operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write 
> operations=1002 Job Counters  Killed map tasks=1 Launched map tasks=1000 
> Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 
> Total time spent by all maps in occupied slots (ms)=149515400 Total time 
> spent by all reduces in occupied slots (ms)=449288 Total time spent by all 
> map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total 
> vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds 
> taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map 
> tasks=153103769600 Total megabyte-milliseconds taken by all reduce 
> tasks=460070912
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Description: 
Background:

When we execute mapreduce in the ozone, we find that the task will be stuck for 
a long time after the completion of Map and Reduce. The log is as follows:
{code:java}
//代码占位符
20/03/05 14:43:03 INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:03 INFO 
mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job:  map 
92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job:  map 93% reduce 30%20/03/05 
14:43:08 INFO mapreduce.Job:  map 93% reduce 31%20/03/05 14:43:11 INFO 
mapreduce.Job:  map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job:  map 
95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job:  map 96% reduce 31%20/03/05 
14:43:20 INFO mapreduce.Job:  map 97% reduce 32%20/03/05 14:43:24 INFO 
mapreduce.Job:  map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job:  map 
99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job:  map 100% reduce 
33%20/03/05 14:43:33 INFO mapreduce.Job:  map 100% reduce 100%20/03/05 15:29:52 
INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully20/03/05 
15:29:52 INFO mapreduce.Job: Counters: 51 File System Counters FILE: Number of 
bytes read=84602 FILE: Number of bytes written=162626320 FILE: Number of read 
operations=0 FILE: Number of large read operations=0 FILE: Number of write 
operations=0 O3FS: Number of bytes read=237780 O3FS: Number of bytes 
written=134217728089 O3FS: Number of read operations=4008 O3FS: Number of large 
read operations=0 O3FS: Number of write operations=1002 Job Counters  Killed 
map tasks=1 Launched map tasks=1000 Launched reduce tasks=1 Data-local map 
tasks=979 Rack-local map tasks=21 Total time spent by all maps in occupied 
slots (ms)=149515400 Total time spent by all reduces in occupied slots 
(ms)=449288 Total time spent by all map tasks (ms)=7475770 Total time spent by 
all reduce tasks (ms)=112322 Total vcore-milliseconds taken by all map 
tasks=7475770 Total vcore-milliseconds taken by all reduce tasks=112322 Total 
megabyte-milliseconds taken by all map tasks=153103769600 Total 
megabyte-milliseconds taken by all reduce tasks=460070912
{code}

> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
>
> Background:
> When we execute mapreduce in the ozone, we find that the task will be stuck 
> for a long time after the completion of Map and Reduce. The log is as follows:
> {code:java}
> //代码占位符
> 20/03/05 14:43:03 INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:03 
> INFO mapreduce.Job:  map 91% reduce 30%20/03/05 14:43:05 INFO mapreduce.Job:  
> map 92% reduce 30%20/03/05 14:43:07 INFO mapreduce.Job:  map 93% reduce 
> 30%20/03/05 14:43:08 INFO mapreduce.Job:  map 93% reduce 31%20/03/05 14:43:11 
> INFO mapreduce.Job:  map 94% reduce 31%20/03/05 14:43:14 INFO mapreduce.Job:  
> map 95% reduce 31%20/03/05 14:43:18 INFO mapreduce.Job:  map 96% reduce 
> 31%20/03/05 14:43:20 INFO mapreduce.Job:  map 97% reduce 32%20/03/05 14:43:24 
> INFO mapreduce.Job:  map 98% reduce 32%20/03/05 14:43:26 INFO mapreduce.Job:  
> map 99% reduce 33%20/03/05 14:43:30 INFO mapreduce.Job:  map 100% reduce 
> 33%20/03/05 14:43:33 INFO mapreduce.Job:  map 100% reduce 100%20/03/05 
> 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed 
> successfully20/03/05 15:29:52 INFO mapreduce.Job: Counters: 51 File System 
> Counters FILE: Number of bytes read=84602 FILE: Number of bytes 
> written=162626320 FILE: Number of read operations=0 FILE: Number of large 
> read operations=0 FILE: Number of write operations=0 O3FS: Number of bytes 
> read=237780 O3FS: Number of bytes written=134217728089 O3FS: Number of read 
> operations=4008 O3FS: Number of large read operations=0 O3FS: Number of write 
> operations=1002 Job Counters  Killed map tasks=1 Launched map tasks=1000 
> Launched reduce tasks=1 Data-local map tasks=979 Rack-local map tasks=21 
> Total time spent by all maps in occupied slots (ms)=149515400 Total time 
> spent by all reduces in occupied slots (ms)=449288 Total time spent by all 
> map tasks (ms)=7475770 Total time spent by all reduce tasks (ms)=112322 Total 
> vcore-milliseconds taken by all map tasks=7475770 Total vcore-milliseconds 
> taken by all reduce tasks=112322 Total megabyte-milliseconds taken by all map 
> tasks=153103769600 Total megabyte-milliseconds taken by all reduce 
> tasks=460070912
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files

2020-03-10 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056619#comment-17056619
 ] 

Siddharth Wagle commented on HDDS-3133:
---

But getFileId is only defined in HdfsFileStatus and OzoneFileStatus
implements FileStatus

On Tue, Mar 10, 2020, 8:27 PM Mukul Kumar Singh (Jira) 



> Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
> -
>
> Key: HDDS-3133
> URL: https://issues.apache.org/jira/browse/HDDS-3133
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>
> Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds 
> need to be exported as fileIds to allow the caching to happen effectively.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mingchao zhao updated HDDS-3155:

Issue Type: Improvement  (was: Bug)

> Improved ozone flush implementation to make it faster.
> --
>
> Key: HDDS-3155
> URL: https://issues.apache.org/jira/browse/HDDS-3155
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: mingchao zhao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files

2020-03-10 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056616#comment-17056616
 ] 

Mukul Kumar Singh commented on HDDS-3133:
-

yes, Ozone should implement something like OzoneFileStatus where it should also 
export fileIDs. Currently Ozone does not export something like fileIds.

> Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
> -
>
> Key: HDDS-3133
> URL: https://issues.apache.org/jira/browse/HDDS-3133
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>
> Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds 
> need to be exported as fileIds to allow the caching to happen effectively.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3155) Improved ozone flush implementation to make it faster.

2020-03-10 Thread mingchao zhao (Jira)
mingchao zhao created HDDS-3155:
---

 Summary: Improved ozone flush implementation to make it faster.
 Key: HDDS-3155
 URL: https://issues.apache.org/jira/browse/HDDS-3155
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: mingchao zhao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] mukul1987 commented on issue #588: HDDS-2886. parse and dump datanode segment file to pritable text

2020-03-10 Thread GitBox
mukul1987 commented on issue #588: HDDS-2886. parse and dump datanode segment 
file to pritable text
URL: https://github.com/apache/hadoop-ozone/pull/588#issuecomment-597422675
 
 
   Thanks for the review @bharatviswa504 , addressed comments in the next path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3116) Datanode sometimes fails to start with NPE when starting Ratis xceiver server

2020-03-10 Thread Hanisha Koneru (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056541#comment-17056541
 ] 

Hanisha Koneru commented on HDDS-3116:
--

There is a lot of circular dependency between OzoneContainer, 
XceiverServerRatis and StateContext. AFAICS it might not be a trivial change to 
fix this.

Not sure if adding the synchronization will resolve the issue altogether. I 
tried reproing but couldn't. Will try reproing on a docker cluster.

> Datanode sometimes fails to start with NPE when starting Ratis xceiver server
> -
>
> Key: HDDS-3116
> URL: https://issues.apache.org/jira/browse/HDDS-3116
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: full_logs.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While working on a network Topology test (HDDS-3084) which does the following:
> 1. Start a cluster with 6 DNs and 2 racks.
> 2. Create a volume, bucket and a single key.
> 3. Stop one rack of hosts using "docker-compose down"
> 4. Read the data from the single key
> 5. Start the 3 down hosts
> 6. Stop the other 3 hosts
> 7. Attempt to read the key again.
> At step 5 I sometimes see this stack trace in one of the DNs and it fails to 
> full come up:
> {code}
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO 
> ozoneimpl.OzoneContainer: Attempting to start container services.
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO 
> ozoneimpl.OzoneContainer: Background container scanner has been disabled.
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO 
> ratis.XceiverServerRatis: Starting XceiverServerRatis 
> 8c1178dd-c44d-49d1-b899-cc3e40ae8f23 at port 9858
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] WARN 
> statemachine.EndpointStateMachine: Unable to communicate to SCM server at 
> scm:9861 for past 15000 seconds.
> java.io.IOException: java.lang.NullPointerException
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:418)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:232)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.sendPipelineReport(XceiverServerRatis.java:757)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.notifyGroupAdd(XceiverServerRatis.java:739)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.initialize(ContainerStateMachine.java:218)
>   at 
> org.apache.ratis.server.impl.ServerState.initStatemachine(ServerState.java:160)
>   at org.apache.ratis.server.impl.ServerState.(ServerState.java:112)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:112)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>   ... 3 more
> {code}
> The DN does not recover from this automatically, although I confirmed that a 
> full cluster restart fixed it (docker-compose stop; docker-compose start). I 
> will try to confirm if a restart of the stuck DN would fix it or not too.



--
This message 

[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement 
getIfExist in Table and use it in CreateKey/File
URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390670650
 
 

 ##
 File path: 
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java
 ##
 @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException {
 }
   }
 
+  @Override
+  public byte[] getIfExist(byte[] key) throws IOException {
+try {
+  // RocksDB#keyMayExist
+  // If the key definitely does not exist in the database, then this
+  // method returns false, else true.
+  rdbMetrics.incNumDBKeyGetIfExistChecks();
+  StringBuilder outValue = new StringBuilder();
+  boolean keyMayExist = db.keyMayExist(handle, key, outValue);
+  if (keyMayExist) {
+// Not using out value from string builder, as that is causing
+// IllegalArgumentException during protobuf parsing.
 
 Review comment:
   ```
   @Override
   public byte[] get(byte[] key) throws IOException {
 try {
   // RocksDB#keyMayExist
   // If the key definitely does not exist in the database, then this
   // method returns false, else true.
   rdbMetrics.incNumDBKeyIfExistChecks();
   StringBuilder outValue = new StringBuilder();
   boolean keyMayExist = db.keyMayExist(handle, key, outValue);
   if (keyMayExist) {
 byte[] val;
 if (outValue.length() > 0) {
   val = outValue.toString().getBytes(UTF_8);
 } else {
   val = db.get(handle, key);
 }
 if (val != null) {
   rdbMetrics.incNumDBKeyIfExistMisses();
 }
 return val;
   }
   return null;
 } catch (RocksDBException e) {
   throw toIOException(
   "Error in accessing DB. ", e);
 }
   }
   ```
   
   I tried with the above code, getting IllegalException during parsing. So, 
for now, removed not to use outVal.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement 
getIfExist in Table and use it in CreateKey/File
URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390670650
 
 

 ##
 File path: 
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java
 ##
 @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException {
 }
   }
 
+  @Override
+  public byte[] getIfExist(byte[] key) throws IOException {
+try {
+  // RocksDB#keyMayExist
+  // If the key definitely does not exist in the database, then this
+  // method returns false, else true.
+  rdbMetrics.incNumDBKeyGetIfExistChecks();
+  StringBuilder outValue = new StringBuilder();
+  boolean keyMayExist = db.keyMayExist(handle, key, outValue);
+  if (keyMayExist) {
+// Not using out value from string builder, as that is causing
+// IllegalArgumentException during protobuf parsing.
 
 Review comment:
   @Override
   public byte[] get(byte[] key) throws IOException {
 try {
   // RocksDB#keyMayExist
   // If the key definitely does not exist in the database, then this
   // method returns false, else true.
   rdbMetrics.incNumDBKeyIfExistChecks();
   StringBuilder outValue = new StringBuilder();
   boolean keyMayExist = db.keyMayExist(handle, key, outValue);
   if (keyMayExist) {
 byte[] val;
 if (outValue.length() > 0) {
   val = outValue.toString().getBytes(UTF_8);
 } else {
   val = db.get(handle, key);
 }
 if (val != null) {
   rdbMetrics.incNumDBKeyIfExistMisses();
 }
 return val;
   }
   return null;
 } catch (RocksDBException e) {
   throw toIOException(
   "Error in accessing DB. ", e);
 }
   }
   
   I tried with the above code, getting IllegalException during parsing. So, 
for now, removed not to use outVal.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement 
getIfExist in Table and use it in CreateKey/File
URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390670650
 
 

 ##
 File path: 
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java
 ##
 @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException {
 }
   }
 
+  @Override
+  public byte[] getIfExist(byte[] key) throws IOException {
+try {
+  // RocksDB#keyMayExist
+  // If the key definitely does not exist in the database, then this
+  // method returns false, else true.
+  rdbMetrics.incNumDBKeyGetIfExistChecks();
+  StringBuilder outValue = new StringBuilder();
+  boolean keyMayExist = db.keyMayExist(handle, key, outValue);
+  if (keyMayExist) {
+// Not using out value from string builder, as that is causing
+// IllegalArgumentException during protobuf parsing.
 
 Review comment:
   @Override
   ```
   public byte[] get(byte[] key) throws IOException {
 try {
   // RocksDB#keyMayExist
   // If the key definitely does not exist in the database, then this
   // method returns false, else true.
   rdbMetrics.incNumDBKeyIfExistChecks();
   StringBuilder outValue = new StringBuilder();
   boolean keyMayExist = db.keyMayExist(handle, key, outValue);
   if (keyMayExist) {
 byte[] val;
 if (outValue.length() > 0) {
   val = outValue.toString().getBytes(UTF_8);
 } else {
   val = db.get(handle, key);
 }
 if (val != null) {
   rdbMetrics.incNumDBKeyIfExistMisses();
 }
 return val;
   }
   return null;
 } catch (RocksDBException e) {
   throw toIOException(
   "Error in accessing DB. ", e);
 }
   }
   ```
   
   I tried with the above code, getting IllegalException during parsing. So, 
for now, removed not to use outVal.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #654: HDDS-3150. Implement 
getIfExist in Table and use it in CreateKey/File
URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390669860
 
 

 ##
 File path: 
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java
 ##
 @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException {
 }
   }
 
+  @Override
+  public byte[] getIfExist(byte[] key) throws IOException {
+try {
+  // RocksDB#keyMayExist
+  // If the key definitely does not exist in the database, then this
+  // method returns false, else true.
+  rdbMetrics.incNumDBKeyGetIfExistChecks();
+  StringBuilder outValue = new StringBuilder();
+  boolean keyMayExist = db.keyMayExist(handle, key, outValue);
+  if (keyMayExist) {
+// Not using out value from string builder, as that is causing
+// IllegalArgumentException during protobuf parsing.
 
 Review comment:
   Yes, using that is causing the issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3100) Fix TestDeadNodeHandler.

2020-03-10 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-3100:

Status: Patch Available  (was: Open)

> Fix TestDeadNodeHandler.
> 
>
> Key: HDDS-3100
> URL: https://issues.apache.org/jira/browse/HDDS-3100
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and dump datanode segment file to pritable text

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and 
dump datanode segment file to pritable text
URL: https://github.com/apache/hadoop-ozone/pull/588#discussion_r390665564
 
 

 ##
 File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/segmentparser/GenericParser.java
 ##
 @@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.segmentparser;
+
+import org.apache.hadoop.hdds.cli.HddsVersionProvider;
+import picocli.CommandLine;
+
+import java.util.concurrent.Callable;
+
+/**
+ * Command line utility to parse and dump any generic ratis segment file.
+ */
+@CommandLine.Command(
+name = "generic",
+description = "dump generic ratis segment file",
+mixinStandardHelpOptions = true,
+versionProvider = HddsVersionProvider.class)
+public class GenericParser extends BaseLogParser implements Callable {
 
 Review comment:
   Can we rename this as GenericRatisLogParser?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and dump datanode segment file to pritable text

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #588: HDDS-2886. parse and 
dump datanode segment file to pritable text
URL: https://github.com/apache/hadoop-ozone/pull/588#discussion_r390665439
 
 

 ##
 File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/segmentparser/DatanodeParser.java
 ##
 @@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.segmentparser;
+
+import org.apache.hadoop.hdds.cli.HddsVersionProvider;
+import org.apache.hadoop.ozone.container.common.transport.server
+.ratis.ContainerStateMachine;
+import org.apache.ratis.proto.RaftProtos.StateMachineLogEntryProto;
+import org.apache.ratis.protocol.RaftGroupId;
+import org.apache.ratis.thirdparty.com.google.protobuf.ByteString;
+import picocli.CommandLine;
+
+import java.util.concurrent.Callable;
+
+/**
+ * Command line utility to parse and dump a datanode ratis segment file.
+ */
+@CommandLine.Command(
+name = "datanode",
+description = "dump datanode segment file",
+mixinStandardHelpOptions = true,
+versionProvider = HddsVersionProvider.class)
+public class DatanodeParser extends BaseLogParser implements Callable {
 
 Review comment:
   Can we name this class as DatanodeRatisLogParser


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #654: HDDS-3150. Implement getIfExist in Table and use it in CreateKey/File

2020-03-10 Thread GitBox
avijayanhwx commented on a change in pull request #654: HDDS-3150. Implement 
getIfExist in Table and use it in CreateKey/File
URL: https://github.com/apache/hadoop-ozone/pull/654#discussion_r390664487
 
 

 ##
 File path: 
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java
 ##
 @@ -155,6 +155,32 @@ public boolean isExist(byte[] key) throws IOException {
 }
   }
 
+  @Override
+  public byte[] getIfExist(byte[] key) throws IOException {
+try {
+  // RocksDB#keyMayExist
+  // If the key definitely does not exist in the database, then this
+  // method returns false, else true.
+  rdbMetrics.incNumDBKeyGetIfExistChecks();
+  StringBuilder outValue = new StringBuilder();
+  boolean keyMayExist = db.keyMayExist(handle, key, outValue);
+  if (keyMayExist) {
+// Not using out value from string builder, as that is causing
+// IllegalArgumentException during protobuf parsing.
 
 Review comment:
   Does this mean we cannot use the outValue that we pass in? If 'keyMayExist' 
returns true, and the value is indeed present in the block cache, I believe the 
outValue will have the data.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3120) Freon work with OM HA

2020-03-10 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-3120:
-
Fix Version/s: 0.6.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Freon work with OM HA
> -
>
> Key: HDDS-3120
> URL: https://issues.apache.org/jira/browse/HDDS-3120
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Make Freon commands work with OM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649#issuecomment-597342135
 
 
   Thank You @adoroszlai for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
bharatviswa504 merged pull request #649: HDDS-3120. Freon work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3154) Intermittent failure in Test2WayCommitInRatis

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-3154:
---
Attachment: org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis.txt

org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis-output.txt

> Intermittent failure in Test2WayCommitInRatis
> -
>
> Key: HDDS-3154
> URL: https://issues.apache.org/jira/browse/HDDS-3154
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Priority: Major
> Attachments: 
> org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis-output.txt, 
> org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis.txt
>
>
> Test2WayCommitInRatis may fail due to {{TimeoutIOException: Request #8 
> timeout 3s}} from Ratis while closing the container.  [~shashikant], can you 
> please take a look? 
>  Logs with RaftClient set to debug level attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3154) Intermittent failure in Test2WayCommitInRatis

2020-03-10 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-3154:
--

 Summary: Intermittent failure in Test2WayCommitInRatis
 Key: HDDS-3154
 URL: https://issues.apache.org/jira/browse/HDDS-3154
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Attila Doroszlai
 Attachments: 
org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis-output.txt, 
org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis.txt

Test2WayCommitInRatis may fail due to {{TimeoutIOException: Request #8 timeout 
3s}} from Ratis while closing the container.  [~shashikant], can you please 
take a look? 
 Logs with RaftClient set to debug level attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon 
work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390608371
 
 

 ##
 File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/BaseFreonGenerator.java
 ##
 @@ -374,53 +369,37 @@ public String generateObjectName(long counter) {
   /**
* Create missing target volume/bucket.
*/
-  public void ensureVolumeAndBucketExist(OzoneConfiguration ozoneConfiguration,
+  public void ensureVolumeAndBucketExist(OzoneClient rpcClient,
   String volumeName, String bucketName) throws IOException {
 
-try (OzoneClient rpcClient = OzoneClientFactory
-.getRpcClient(ozoneConfiguration)) {
+OzoneVolume volume;
+ensureVolumeExists(rpcClient, volumeName);
+volume = rpcClient.getObjectStore().getVolume(volumeName);
 
-  OzoneVolume volume = null;
-  try {
-volume = rpcClient.getObjectStore().getVolume(volumeName);
-  } catch (OMException ex) {
-if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) {
-  rpcClient.getObjectStore().createVolume(volumeName);
-  volume = rpcClient.getObjectStore().getVolume(volumeName);
-} else {
-  throw ex;
-}
-  }
-
-  try {
-volume.getBucket(bucketName);
-  } catch (OMException ex) {
-if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) {
-  volume.createBucket(bucketName);
-} else {
-  throw ex;
-}
+try {
+  volume.getBucket(bucketName);
+} catch (OMException ex) {
+  if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) {
+volume.createBucket(bucketName);
+  } else {
+throw ex;
   }
 }
+
   }
 
   /**
* Create missing target volume.
*/
   public void ensureVolumeExists(
-  OzoneConfiguration ozoneConfiguration,
+  OzoneClient rpcClient,
   String volumeName) throws IOException {
-try (OzoneClient rpcClient = OzoneClientFactory
-.getRpcClient(ozoneConfiguration)) {
-
-  try {
-rpcClient.getObjectStore().getVolume(volumeName);
-  } catch (OMException ex) {
-if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) {
-  rpcClient.getObjectStore().createVolume(volumeName);
-}
+try {
+  rpcClient.getObjectStore().getVolume(volumeName);
+} catch (OMException ex) {
+  if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) {
+rpcClient.getObjectStore().createVolume(volumeName);
 
 Review comment:
   Nice catch. But that should fail in the next steps in our case like during 
getVolume/CreateBucket. But it is better to throw an exception from here. 
Addressed in the latest commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
adoroszlai commented on a change in pull request #649: HDDS-3120. Freon work 
with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390599201
 
 

 ##
 File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/BaseFreonGenerator.java
 ##
 @@ -374,53 +369,37 @@ public String generateObjectName(long counter) {
   /**
* Create missing target volume/bucket.
*/
-  public void ensureVolumeAndBucketExist(OzoneConfiguration ozoneConfiguration,
+  public void ensureVolumeAndBucketExist(OzoneClient rpcClient,
   String volumeName, String bucketName) throws IOException {
 
-try (OzoneClient rpcClient = OzoneClientFactory
-.getRpcClient(ozoneConfiguration)) {
+OzoneVolume volume;
+ensureVolumeExists(rpcClient, volumeName);
+volume = rpcClient.getObjectStore().getVolume(volumeName);
 
-  OzoneVolume volume = null;
-  try {
-volume = rpcClient.getObjectStore().getVolume(volumeName);
-  } catch (OMException ex) {
-if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) {
-  rpcClient.getObjectStore().createVolume(volumeName);
-  volume = rpcClient.getObjectStore().getVolume(volumeName);
-} else {
-  throw ex;
-}
-  }
-
-  try {
-volume.getBucket(bucketName);
-  } catch (OMException ex) {
-if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) {
-  volume.createBucket(bucketName);
-} else {
-  throw ex;
-}
+try {
+  volume.getBucket(bucketName);
+} catch (OMException ex) {
+  if (ex.getResult() == ResultCodes.BUCKET_NOT_FOUND) {
+volume.createBucket(bucketName);
+  } else {
+throw ex;
   }
 }
+
   }
 
   /**
* Create missing target volume.
*/
   public void ensureVolumeExists(
-  OzoneConfiguration ozoneConfiguration,
+  OzoneClient rpcClient,
   String volumeName) throws IOException {
-try (OzoneClient rpcClient = OzoneClientFactory
-.getRpcClient(ozoneConfiguration)) {
-
-  try {
-rpcClient.getObjectStore().getVolume(volumeName);
-  } catch (OMException ex) {
-if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) {
-  rpcClient.getObjectStore().createVolume(volumeName);
-}
+try {
+  rpcClient.getObjectStore().getVolume(volumeName);
+} catch (OMException ex) {
+  if (ex.getResult() == ResultCodes.VOLUME_NOT_FOUND) {
+rpcClient.getObjectStore().createVolume(volumeName);
 
 Review comment:
   Should `throw ex` in `else` branch, otherwise volume creation fails silently 
and will run into NPE elsewhere.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.

2020-03-10 Thread GitBox
avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597303328
 
 
   Thank you for the review @adoroszlai. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2989) Intermittent timeout in TestBlockManager

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2989:
-
Labels: pull-request-available  (was: )

> Intermittent timeout in TestBlockManager
> 
>
> Key: HDDS-2989
> URL: https://issues.apache.org/jira/browse/HDDS-2989
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/430663688}
> 2020-02-06T21:44:53.5319531Z [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 5.344 s <<< FAILURE! - in 
> org.apache.hadoop.hdds.scm.block.TestBlockManager
> 2020-02-06T21:44:53.5319796Z [ERROR] 
> testMultipleBlockAllocation(org.apache.hadoop.hdds.scm.block.TestBlockManager)
>   Time elapsed: 1.167 s  <<< ERROR!
> 2020-02-06T21:44:53.5319942Z java.util.concurrent.TimeoutException: 
> 2020-02-06T21:44:53.5320496Z Timed out waiting for condition. Thread 
> diagnostics:
> 2020-02-06T21:44:53.5320839Z Timestamp: 2020-02-06 09:44:52,261
> 2020-02-06T21:44:53.5320901Z 
> 2020-02-06T21:44:53.5321178Z "Thread-26"  prio=5 tid=46 runnable
> 2020-02-06T21:44:53.5321292Z java.lang.Thread.State: RUNNABLE
> 2020-02-06T21:44:53.5321391Z at java.lang.Thread.dumpThreads(Native 
> Method)
> 2020-02-06T21:44:53.5326891Z at 
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> 2020-02-06T21:44:53.5327144Z at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> 2020-02-06T21:44:53.5327309Z at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> 2020-02-06T21:44:53.5327465Z at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
> 2020-02-06T21:44:53.5327618Z at 
> org.apache.hadoop.hdds.scm.block.TestBlockManager.testMultipleBlockAllocation(TestBlockManager.java:280)
> 2020-02-06T21:44:53.5388042Z at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-02-06T21:44:53.5388702Z at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-02-06T21:44:53.5388905Z at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-02-06T21:44:53.5389045Z at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-02-06T21:44:53.5389195Z at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 2020-02-06T21:44:53.5389331Z at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-02-06T21:44:53.5389662Z at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 2020-02-06T21:44:53.5389776Z at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-02-06T21:44:53.5389916Z at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2020-02-06T21:44:53.5390040Z "Signal Dispatcher" daemon prio=9 tid=4 runnable
> 2020-02-06T21:44:53.5390156Z java.lang.Thread.State: RUNNABLE
> 2020-02-06T21:44:53.5390783Z 
> "EventQueue-CloseContainerForCloseContainerEventHandler"  prio=5 tid=32 in 
> Object.wait()
> 2020-02-06T21:44:53.5390916Z java.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-02-06T21:44:53.5391019Z at sun.misc.Unsafe.park(Native Method)
> 2020-02-06T21:44:53.5391149Z at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2020-02-06T21:44:53.5391299Z at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> 2020-02-06T21:44:53.5391448Z at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 2020-02-06T21:44:53.5391587Z at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> 2020-02-06T21:44:53.5391721Z at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> 2020-02-06T21:44:53.5391844Z at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-02-06T21:44:53.5391971Z at java.lang.Thread.run(Thread.java:748)
> 2020-02-06T21:44:53.5392100Z "IPC Server idle connection scanner for port 
> 43801" daemon prio=5 tid=24 in Object.wait()
> 2020-02-06T21:44:53.5392227Z java.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-02-06T21:44:53.5392347Z at java.lang.Object.wait(Native Method)
> 2020-02-06T21:44:53.5392463Z at java.lang.Object.wait(Object.java:502)
> 

[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in TestBlockManager

2020-03-10 Thread GitBox
adoroszlai opened a new pull request #659: HDDS-2989. Intermittent timeout in 
TestBlockManager
URL: https://github.com/apache/hadoop-ozone/pull/659
 
 
   ## What changes were proposed in this pull request?
   
   `TestBlockManager` intermittently times out waiting for exit from safe mode. 
 This happens due to race condition between two safe mode status events in 
different handler threads (but the same handler object): one from SCM, another 
from the test code.
   
   Temporary debug log (in "passing" order):
   
   ```
   (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling 
safe mode status event in thread 26: true
   (SafeModeHandler.java:onMessage(103)) - SafeModeHandler@2bde2598 handling 
safe mode status event in thread 28: false
   ```
   
   If the order is reversed, SCM may stay in safe mode as far as 
`BlockManagerImpl` sees it.  Worse, it may return to safe mode while 
`BlockManagerImpl` is trying to perform some operation, eg.:
   
   ```
   SCMException: SafeModePrecheck failed for allocateBlock
   ...
 at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:160)
 at 
org.apache.hadoop.hdds.scm.block.TestBlockManager.testAllocateBlock(TestBlockManager.java:150)
   ```
   
   The proposed fix is to disable safe mode status emission (ie. ignore the 
event from SCM) and let the test set safe mode explicitly in 
`BlockManagerImpl`.  This should be fine since this is a unit test, not 
integration one.
   
   https://issues.apache.org/jira/browse/HDDS-2989
   
   ## How was this patch tested?
   
   Ran TestBlockManager 10x:
   https://github.com/adoroszlai/hadoop-ozone/runs/497791137
   
   then 50x:
   https://github.com/adoroszlai/hadoop-ozone/runs/497839450
   
   and regular full CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/498781616


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3153) Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-10 Thread Vivek Ratnavel Subramanian (Jira)
Vivek Ratnavel Subramanian created HDDS-3153:


 Summary: Create REST API to serve Recon Dashboard and integrate 
with UI in Recon.
 Key: HDDS-3153
 URL: https://issues.apache.org/jira/browse/HDDS-3153
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: Ozone Recon
Affects Versions: 0.5.0
Reporter: Vivek Ratnavel Subramanian
Assignee: Vivek Ratnavel Subramanian
 Attachments: Screen Shot 2020-03-10 at 12.10.41 PM.png

Add a REST API to serve information required for recon dashboard

!Screen Shot 2020-03-10 at 12.10.41 PM.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #655: HDDS-3100. Fix TestDeadNodeHandler.

2020-03-10 Thread GitBox
adoroszlai commented on a change in pull request #655: HDDS-3100. Fix 
TestDeadNodeHandler.
URL: https://github.com/apache/hadoop-ozone/pull/655#discussion_r390547825
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/TestDeadNodeHandler.java
 ##
 @@ -89,16 +89,16 @@ public void setup() throws IOException, 
AuthenticationException {
 storageDir = GenericTestUtils.getTempPath(
 TestDeadNodeHandler.class.getSimpleName() + UUID.randomUUID());
 conf.set(HddsConfigKeys.OZONE_METADATA_DIRS, storageDir);
-conf.setInt(OZONE_DATANODE_PIPELINE_LIMIT, 0);
 eventQueue = new EventQueue();
+System.out.println("eventQueue = "  + eventQueue.toString());
 
 Review comment:
   Leftover debug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #648: HDDS-3117. Recon throws InterruptedException while getting new snapshot from OM.

2020-03-10 Thread GitBox
avijayanhwx commented on a change in pull request #648: HDDS-3117. Recon throws 
InterruptedException while getting new snapshot from OM.
URL: https://github.com/apache/hadoop-ozone/pull/648#discussion_r390532746
 
 

 ##
 File path: hadoop-ozone/dist/src/main/smoketest/recon/recon-api.robot
 ##
 @@ -25,16 +25,26 @@ ${ENDPOINT_URL}   http://recon:9888
 ${API_ENDPOINT_URL}   http://recon:9888/api/v1
 
 *** Test Cases ***
-Recon REST API
+Recon OM APIs
+Run Keyword if  '${SECURITY_ENABLED}' == 'true' Kinit test user
 testuser testuser.keytab
+Execute ozone freon rk 
--numOfVolumes 1 --numOfBuckets 1 --numOfKeys 10 --keySize 1025
+Sleep   90s
 
 Review comment:
   Thanks for the suggestions. I have fixed this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3133) Add export objectIds in Ozone as FileIds to allow LLAP to cache the files

2020-03-10 Thread Siddharth Wagle (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056202#comment-17056202
 ] 

Siddharth Wagle commented on HDDS-3133:
---

[~msingh] The org.apache.hadoop.hdfs.protocol.HdfsFileStatus#getFileId, is 
defined in HdfsFileStatus, I looked at where this is called:
https://github.com/apache/hive/blob/1e15791987098a177625b16b468e96021fb6dd29/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L767

Ozone cannot implement HdfsFileStatus, so will this need Hive change also? 
cc:[~avijayan] since he worked on this area in Ozone.

> Add export objectIds in Ozone as FileIds to allow LLAP to cache the files
> -
>
> Key: HDDS-3133
> URL: https://issues.apache.org/jira/browse/HDDS-3133
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem, Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>
> Hive LLAP makes use of the fileids to cache the files data. Ozone's objectIds 
> need to be exported as fileIds to allow the caching to happen effectively.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java#L65



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.

2020-03-10 Thread GitBox
adoroszlai commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597208683
 
 
   > Unit test failure (TestBlockManager) is unrelated.
   
   Being fixed in HDDS-2989.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.

2020-03-10 Thread GitBox
avijayanhwx commented on issue #655: HDDS-3100. Fix TestDeadNodeHandler.
URL: https://github.com/apache/hadoop-ozone/pull/655#issuecomment-597207559
 
 
   Unit test failure (TestBlockManager) is unrelated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon 
work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390463072
 
 

 ##
 File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/OzoneClientKeyGenerator.java
 ##
 @@ -86,22 +91,24 @@ public Void call() throws Exception {
 
 OzoneConfiguration ozoneConfiguration = createOzoneConfiguration();
 
-ensureVolumeAndBucketExist(ozoneConfiguration, volumeName, bucketName);
 
 contentGenerator = new ContentGenerator(keySize, bufferSize);
 metadata = new HashMap<>();
 
-try (OzoneClient rpcClient = OzoneClientFactory
-.getRpcClient(ozoneConfiguration)) {
-
-  bucket =
-  rpcClient.getObjectStore().getVolume(volumeName)
-  .getBucket(bucketName);
+OzoneClient rpcClient = null;
+try {
+  rpcClient = createOzoneClient(omServiceID, ozoneConfiguration);
 
 Review comment:
   Initially I was using if,else to create OzoneClient, later moved the code to 
a method. Used try with resource. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
bharatviswa504 commented on a change in pull request #649: HDDS-3120. Freon 
work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649#discussion_r390462290
 
 

 ##
 File path: 
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/BaseFreonGenerator.java
 ##
 @@ -469,4 +456,13 @@ public AtomicLong getAttemptCounter() {
   public int getThreadNo() {
 return threadNo;
   }
+
+  protected OzoneClient createOzoneClient(String omServiceID,
+  OzoneConfiguration conf) throws Exception {
+if (omServiceID != null) {
+  return OzoneClientFactory.getRpcClient(omServiceID, conf);
+} else {
+  return OzoneClientFactory.getRpcClient(conf);
 
 Review comment:
   Removed getClient, and used getRpcClient, to avoid these kind of mistakes 
instead of duplicating the same code in multiple functions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA.

2020-03-10 Thread GitBox
bharatviswa504 commented on issue #649: HDDS-3120. Freon work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/649#issuecomment-597194276
 
 
   Thank You @adoroszlai for the review.
   Addressed review comments.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server

2020-03-10 Thread Nilotpal Nandi (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056097#comment-17056097
 ] 

Nilotpal Nandi commented on HDDS-3088:
--

I think due to larger retry value , the cli client operations are also hung ( 
as it waits for 2147483647 iterations) when it tries to connect to dead SCM and 
it affects the test executions

 

> maxRetries value is too large while trying to reconnect to SCM server
> -
>
> Key: HDDS-3088
> URL: https://issues.apache.org/jira/browse/HDDS-3088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Blocker
>
> MaxRetries value is 2147483647 which is too high
> It keeps on retrying to connect to SCM server.
>  
> {noformat}
> 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10535 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10536 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10537 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10538 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server

2020-03-10 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056089#comment-17056089
 ] 

Arpit Agarwal commented on HDDS-3088:
-

Then we probably want it to try forever, just like HDFS. Do you see any harm in 
doing that? I think what is missing is some kind of delay between the retries, 
currently it seems to be retrying in a tight loop.

> maxRetries value is too large while trying to reconnect to SCM server
> -
>
> Key: HDDS-3088
> URL: https://issues.apache.org/jira/browse/HDDS-3088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Blocker
>
> MaxRetries value is 2147483647 which is too high
> It keeps on retrying to connect to SCM server.
>  
> {noformat}
> 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10535 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10536 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10537 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10538 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3088) maxRetries value is too large while trying to reconnect to SCM server

2020-03-10 Thread Nilotpal Nandi (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056071#comment-17056071
 ] 

Nilotpal Nandi commented on HDDS-3088:
--

[~arp],

Here datanode is trying to connect to SCM.

> maxRetries value is too large while trying to reconnect to SCM server
> -
>
> Key: HDDS-3088
> URL: https://issues.apache.org/jira/browse/HDDS-3088
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Blocker
>
> MaxRetries value is 2147483647 which is too high
> It keeps on retrying to connect to SCM server.
>  
> {noformat}
> 2020-02-27 05:54:43,430 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10535 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:44,431 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10536 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:45,432 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10537 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS)
> 2020-02-27 05:54:46,433 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: quasar-hqknwz-8.quasar-hqknwz.root.hwx.site/172.27.14.1:9861. 
> Already tried 10538 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=2147483647, sleepTime=1000 
> MILLISECONDS){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3118) Possible deadlock in LockManager

2020-03-10 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-3118:
-
Component/s: Ozone Manager

> Possible deadlock in LockManager
> 
>
> Key: HDDS-3118
> URL: https://issues.apache.org/jira/browse/HDDS-3118
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Attila Doroszlai
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: repro.log, repro.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{LockManager}} has a possible deadlock.
> # Number of locks is limited by using a {{GenericObjectPool}}.  If N locks 
> are already acquired, new requestors need to wait.  This wait in 
> {{getLockForLocking}} happens in a callback executed from 
> {{ConcurrentHashMap#compute}} while holding a lock on a map entry.
> # While releasing a lock, {{decrementActiveLockCount}} implicitly requires a 
> lock on an entry in {{ConcurrentHashMap}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on a change in pull request #525: HDDS-2798. beyond/Containers.md translation

2020-03-10 Thread GitBox
elek commented on a change in pull request #525: HDDS-2798. 
beyond/Containers.md translation
URL: https://github.com/apache/hadoop-ozone/pull/525#discussion_r390390241
 
 

 ##
 File path: hadoop-hdds/docs/content/beyond/Containers.zh.md
 ##
 @@ -0,0 +1,212 @@
+---
+title: "Ozone 中的容器技术"
+summary: Ozone 广泛地使用容器来进行测试,本页介绍 Ozone 中容器的使用及其最佳实践。
+weight: 2
+---
+
+
+Ozone 的开发中大量地使用了 Docker,包括以下三种主要的应用场景:
+
+* __开发__:
+ * 我们使用 docker 来启动本地伪集群(docker 可以提供统一的环境,但是不需要创建镜像)。
+* __测试__:
+ * 我们从开发分支创建 docker 镜像,然后在 kubernetes 或其它容器编排系统上测试 ozone。
+ * 我们为每个发行版提供了 _apache/ozone_ 镜像,以方便用户体验 Ozone。
+ 这些镜像 __不__ 应当在 __生产__ 中使用。
+
+
+当在生产中使用容器方式部署 ozone 时,我们强烈建议你创建自己的镜像。请把所有自带的容器镜像和 k8s 
资源文件当作示例指南,参考它们进行定制。
+
+
+* __生产__:
+ * 我们提供了如何创建用于生产的 docker 镜像的文档。
+
+下面我们来详细地介绍一下各种应用场景:
+
+## 开发
+
+Ozone 安装包中包含了 docker-compose 的示例目录,用于方便地在本地机器启动 Ozone 集群。
+
+使用官方提供的发行包:
+
+```bash
+cd compose/ozone
+docker-compose up -d
+```
+
+本地构建方式:
+
+```bash
+cd  hadoop-ozone/dist/target/ozone-*/compose
+docker-compose up -d
+```
+
+这些 compose 环境文件是重要的工具,可以用来随时启动各种类型的 Ozone 集群。
+
+为了确保 compose 文件是最新的,我们提供了验收测试套件,套件会启动集群并检查其基本行为是否正常。
+
+验收测试也包含在发行包中,你可以在 `smoketest` 目录下找到各个测试的定义。
+
+你可以在任意 compose 目录进行测试,比如:
+
+```bash
+cd compose/ozone
+./test.sh
+```
+
+### 实现细节
+
+`compose` 测试都基于 apache/hadoop-runner 镜像,这个镜像本身并不包含任何 Ozone 的 jar 
包或二进制文件,它只是提供其了启动 ozone 的辅助脚本。
+
+hadoop-runner 提供了一个随处运行 Ozone 的固定环境,Ozone 分发包通过目录挂载包含在其中。
+
+(docker-compose 示例片段)
+
+```
+ scm:
+  image: apache/hadoop-runner:jdk11
+  volumes:
+ - ../..:/opt/hadoop
+  ports:
+ - 9876:9876
+
+```
+
+容器应该通过环境变量来进行配置,由于每个容器都应当设置相同的环境变量,我们在单独的文件中维护了一个环境变量列表:
+
+```
+ scm:
+  image: apache/hadoop-runner:jdk11
+  #...
+  env_file:
+  - ./docker-config
+```
+
+docker-config 文件中包含了所需环境变量的列表:
+
+```
+OZONE-SITE.XML_ozone.om.address=om
+OZONE-SITE.XML_ozone.om.http-address=om:9874
+OZONE-SITE.XML_ozone.scm.names=scm
+#...
+```
+
+你可以看到我们所使用的命名规范,根据这些环境变量的名字,`hadoop-runner` 
基础镜像中的[脚本](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts)
 会生成合适的 hadoop XML 配置文件(在我们这种情况下就是 `ozone-site.xml`)。
+
+`hadoop-runner` 
镜像的[入口点](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter
+.sh)包含了一个辅助脚本,这个辅助脚本可以根据环境变量触发上述的配置文件生成以及其它动作(比如初始化 SCM 和 OM 的存储、下载必要的 keytab 
等)。
+
+## 测试 
+
+`docker-compose` 的方式应当只用于本地测试,不适用于多节点集群。要在多节点集群上使用容器,我们需要像 Kubernetes 
这样的容器编排系统。
+
+Kubernetes 示例文件在 `kubernetes` 文件夹中。
+
+*请注意*:所有提供的镜像都使用 `hadoop-runner` 
作为基础镜像,这个镜像中包含了所有测试环境所需的测试工具。对于生产环境,我们推荐用户使用自己的基础镜像创建可靠的镜像。
+
+### 发行包测试
+
+可以通过部署任意的示例集群来测试发行包:
+
+```bash
+cd kubernetes/examples/ozone
+kubectl apply -f
+```
+
+注意,此时会从 Docker Hub 下载最新的镜像。
+
+### 开发构建测试
+
+为了测试开发中的构建,你需要创建自己的镜像并上传到自己的 docker 仓库中:
+
+
+```bash
+mvn clean install -DskipTests -Pdocker-build,docker-push 
-Ddocker.image=myregistry:9000/name/ozone
+```
+
+所有生成的 kubernetes 资源文件都会使用这个镜像 (`image:` keys are adjusted during the build)
+
+```bash
+cd kubernetes/examples/ozone
+kubectl apply -f
+```
+
+## 生产
+
+
+我们强烈推荐在生产集群使用自己的镜像,并根据实际的需求调整基础镜像、文件掩码、安全设置和用户设置。
+
+
+你可以使用我们开发中所用的镜像作为示例:
+
+ * [基础镜像] 
(https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile)
+ * [完整镜像] 
(https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/docker/Dockerfile)
+
+ Dockerfile 中大部分内容都是可选的辅助功能,但如果要使用我们提供的 kubernetes 
示例资源文件,你可能需要[这里](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts)的脚本。
+
+  * 两个 python 脚本将环境变量转化为实际的 hadoop XML 配置文件
+  * start.sh 根据环境变量执行 python 脚本(以及其它初始化工作)
+
+## 容器
+
+Ozone 相关的容器镜像和 Dockerfile 位置:
+
+
+
+  
+
+  #
+  容器
+  仓库
+  基础镜像
+  分支
+  标签
+  说明
+
+  
+  
+
+  1
+  apache/ozone
+  https://github.com/apache/hadoop-docker-ozone
+  ozone-... 
+  hadoop-runner
+  0.3.0,0.4.0,0.4.1
+  每个 Ozone 发行版都对应一个新标签。
+
+
+  2
+  apache/hadoop-runner 
+  https://github.com/apache/hadoop
+  docker-hadoop-runner
+  centos
+  jdk11,jdk8,latest
+  这是用于测试 Hadoop Ozone 的基础镜像,包含了一系列可以让我们更加方便地运行 Ozone 的工具。
+  
+
+
 
 Review comment:
   Sure, feel free to remove it. I will create a patch as the previous lines 
are also outdated...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-3152:
---
Status: Patch Available  (was: In Progress)

> Reduce number of chunkwriter threads in integration tests
> -
>
> Key: HDDS-3152
> URL: https://issues.apache.org/jira/browse/HDDS-3152
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Integration tests run multiple datanodes in the same JVM.  Each datanode 
> comes with 60 chunk writer threads by default (may be decreased in 
> HDDS-3053).  This makes thread dumps (eg. produced by 
> {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there 
> may be 300+ such threads.
> Since integration tests are generally run with a single disk which is even 
> shared among the datanodes, a few threads per datanode should be enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-3095:
---
Status: Patch Available  (was: In Progress)

> Intermittent failure in 
> TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
> ---
>
> Key: HDDS-3095
> URL: https://issues.apache.org/jira/browse/HDDS-3095
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597}
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 284.887 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
> [ERROR] 
> testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient)
>   Time elapsed: 66.589 s  <<< FAILURE!
> java.lang.AssertionError
> ...
>at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3095:
-
Labels: pull-request-available  (was: )

> Intermittent failure in 
> TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
> ---
>
> Key: HDDS-3095
> URL: https://issues.apache.org/jira/browse/HDDS-3095
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597}
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 284.887 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
> [ERROR] 
> testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient)
>   Time elapsed: 66.589 s  <<< FAILURE!
> java.lang.AssertionError
> ...
>at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #658: HDDS-3095. Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit

2020-03-10 Thread GitBox
adoroszlai opened a new pull request #658: HDDS-3095. Intermittent failure in 
TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
URL: https://github.com/apache/hadoop-ozone/pull/658
 
 
   ## What changes were proposed in this pull request?
   
   Intermittent failure in `TestFailureHandlingByClient` happens when the 
datanode just stopped is not excluded during subsequent write operation.
   
   This PR proposes to make `MiniOzoneCluster` wait for datanode to stop, as it 
already does during "restart datanode".
   
   https://issues.apache.org/jira/browse/HDDS-3095
   
   ## How was this patch tested?
   
   Ran `TestFailureHandlingByClient` 20x successfully:
   https://github.com/adoroszlai/hadoop-ozone/runs/497741382
   
   and regular full CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/497755796
   where only failure is (supposedly) unrelated in Test2WayCommitInRatis.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #657: HDDS-3152. Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread GitBox
adoroszlai opened a new pull request #657: HDDS-3152. Reduce number of 
chunkwriter threads in integration tests
URL: https://github.com/apache/hadoop-ozone/pull/657
 
 
   ## What changes were proposed in this pull request?
   
   Integration tests run multiple datanodes in the same JVM.  Each datanode 
comes with 60 chunk writer threads by default (may be decreased in 
[HDDS-3053](https://issues.apache.org/jira/browse/HDDS-3053)).  This makes 
thread dumps (eg. produced by `GenericTestUtils.waitFor` on timeout) really 
hard to navigate, as there may be 300+ such threads.
   
   Since integration tests are generally run with a single disk which is even 
shared among the datanodes, a few threads per datanode should be enough.
   
   https://issues.apache.org/jira/browse/HDDS-3152
   
   ## How was this patch tested?
   
   Regular CI:
   https://github.com/adoroszlai/hadoop-ozone/runs/497866229


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3152:
-
Labels: pull-request-available  (was: )

> Reduce number of chunkwriter threads in integration tests
> -
>
> Key: HDDS-3152
> URL: https://issues.apache.org/jira/browse/HDDS-3152
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> Integration tests run multiple datanodes in the same JVM.  Each datanode 
> comes with 60 chunk writer threads by default (may be decreased in 
> HDDS-3053).  This makes thread dumps (eg. produced by 
> {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there 
> may be 300+ such threads.
> Since integration tests are generally run with a single disk which is even 
> shared among the datanodes, a few threads per datanode should be enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek edited a comment on issue #389: HDDS-2534. scmcli container delete not working

2020-03-10 Thread GitBox
elek edited a comment on issue #389: HDDS-2534. scmcli container delete not 
working
URL: https://github.com/apache/hadoop-ozone/pull/389#issuecomment-597084180
 
 
   /pending Questions/suggestions from @xiaoyuyao are not yet addressed in the 
last commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on issue #389: HDDS-2534. scmcli container delete not working

2020-03-10 Thread GitBox
elek commented on issue #389: HDDS-2534. scmcli container delete not working
URL: https://github.com/apache/hadoop-ozone/pull/389#issuecomment-597084180
 
 
   /pending Questions/suggestions from @xiaoyuyao are not yet addressed in #432 
12f3f8ac94cf8808757bb7673e4208d8b0fede09


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2989) Intermittent timeout in TestBlockManager

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai reassigned HDDS-2989:
--

Assignee: Attila Doroszlai

> Intermittent timeout in TestBlockManager
> 
>
> Key: HDDS-2989
> URL: https://issues.apache.org/jira/browse/HDDS-2989
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/430663688}
> 2020-02-06T21:44:53.5319531Z [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 5.344 s <<< FAILURE! - in 
> org.apache.hadoop.hdds.scm.block.TestBlockManager
> 2020-02-06T21:44:53.5319796Z [ERROR] 
> testMultipleBlockAllocation(org.apache.hadoop.hdds.scm.block.TestBlockManager)
>   Time elapsed: 1.167 s  <<< ERROR!
> 2020-02-06T21:44:53.5319942Z java.util.concurrent.TimeoutException: 
> 2020-02-06T21:44:53.5320496Z Timed out waiting for condition. Thread 
> diagnostics:
> 2020-02-06T21:44:53.5320839Z Timestamp: 2020-02-06 09:44:52,261
> 2020-02-06T21:44:53.5320901Z 
> 2020-02-06T21:44:53.5321178Z "Thread-26"  prio=5 tid=46 runnable
> 2020-02-06T21:44:53.5321292Z java.lang.Thread.State: RUNNABLE
> 2020-02-06T21:44:53.5321391Z at java.lang.Thread.dumpThreads(Native 
> Method)
> 2020-02-06T21:44:53.5326891Z at 
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> 2020-02-06T21:44:53.5327144Z at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> 2020-02-06T21:44:53.5327309Z at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> 2020-02-06T21:44:53.5327465Z at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
> 2020-02-06T21:44:53.5327618Z at 
> org.apache.hadoop.hdds.scm.block.TestBlockManager.testMultipleBlockAllocation(TestBlockManager.java:280)
> 2020-02-06T21:44:53.5388042Z at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-02-06T21:44:53.5388702Z at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-02-06T21:44:53.5388905Z at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-02-06T21:44:53.5389045Z at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-02-06T21:44:53.5389195Z at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 2020-02-06T21:44:53.5389331Z at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-02-06T21:44:53.5389662Z at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 2020-02-06T21:44:53.5389776Z at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-02-06T21:44:53.5389916Z at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2020-02-06T21:44:53.5390040Z "Signal Dispatcher" daemon prio=9 tid=4 runnable
> 2020-02-06T21:44:53.5390156Z java.lang.Thread.State: RUNNABLE
> 2020-02-06T21:44:53.5390783Z 
> "EventQueue-CloseContainerForCloseContainerEventHandler"  prio=5 tid=32 in 
> Object.wait()
> 2020-02-06T21:44:53.5390916Z java.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-02-06T21:44:53.5391019Z at sun.misc.Unsafe.park(Native Method)
> 2020-02-06T21:44:53.5391149Z at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2020-02-06T21:44:53.5391299Z at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> 2020-02-06T21:44:53.5391448Z at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 2020-02-06T21:44:53.5391587Z at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> 2020-02-06T21:44:53.5391721Z at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> 2020-02-06T21:44:53.5391844Z at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-02-06T21:44:53.5391971Z at java.lang.Thread.run(Thread.java:748)
> 2020-02-06T21:44:53.5392100Z "IPC Server idle connection scanner for port 
> 43801" daemon prio=5 tid=24 in Object.wait()
> 2020-02-06T21:44:53.5392227Z java.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-02-06T21:44:53.5392347Z at java.lang.Object.wait(Native Method)
> 2020-02-06T21:44:53.5392463Z at java.lang.Object.wait(Object.java:502)
> 2020-02-06T21:44:53.5392567Z at 
> 

[jira] [Assigned] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai reassigned HDDS-3095:
--

Assignee: Attila Doroszlai

> Intermittent failure in 
> TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
> ---
>
> Key: HDDS-3095
> URL: https://issues.apache.org/jira/browse/HDDS-3095
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>
> {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597}
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 284.887 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
> [ERROR] 
> testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient)
>   Time elapsed: 66.589 s  <<< FAILURE!
> java.lang.AssertionError
> ...
>at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on issue #578: HDDS-3053. Decrease the number of the chunk writer threads

2020-03-10 Thread GitBox
elek commented on issue #578: HDDS-3053. Decrease the number of the chunk 
writer threads
URL: https://github.com/apache/hadoop-ozone/pull/578#issuecomment-597072663
 
 
   > did u check the pending request queue in the leader?
   
   No I didn't. Why is it interesting? 
   
   > how many mappers were used for the test?
   
   92 (see the link for this and all the other parameters.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3152) Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-3152:
--

 Summary: Reduce number of chunkwriter threads in integration tests
 Key: HDDS-3152
 URL: https://issues.apache.org/jira/browse/HDDS-3152
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


Integration tests run multiple datanodes in the same JVM.  Each datanode comes 
with 60 chunk writer threads by default (may be decreased in HDDS-3053).  This 
makes thread dumps (eg. produced by {{GenericTestUtils.waitFor}} on timeout) 
really hard to navigate, as there may be 300+ such threads.

Since integration tests are generally run with a single disk which is even 
shared among the datanodes, a few threads per datanode should be enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2610) Fix the ObjectStore#listVolumes failure when argument is null

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai resolved HDDS-2610.

Fix Version/s: 0.6.0
   Resolution: Done

> Fix the ObjectStore#listVolumes failure when argument is null
> -
>
> Key: HDDS-2610
> URL: https://issues.apache.org/jira/browse/HDDS-2610
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: YiSheng Lien
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As the description of the 
> [VolumeManager#listVolumes|https://github.com/apache/hadoop-ozone/blob/a731eeaa9ed0d1faecda3665b599145316300101/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/VolumeManager.java#L84-L101],
>  we would list all volumes when setting the userName null.
> But now it throws OMException by underlying method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2717) Handle chunk increments in datanode

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2717:
---
Status: Patch Available  (was: In Progress)

> Handle chunk increments in datanode
> ---
>
> Key: HDDS-2717
> URL: https://issues.apache.org/jira/browse/HDDS-2717
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Let datanode handle incremental additions to chunks (data with non-zero 
> offset).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2610) Fix the ObjectStore#listVolumes failure when argument is null

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2610:
---
Labels:   (was: pull-request-available)

> Fix the ObjectStore#listVolumes failure when argument is null
> -
>
> Key: HDDS-2610
> URL: https://issues.apache.org/jira/browse/HDDS-2610
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: YiSheng Lien
>Assignee: YiSheng Lien
>Priority: Major
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As the description of the 
> [VolumeManager#listVolumes|https://github.com/apache/hadoop-ozone/blob/a731eeaa9ed0d1faecda3665b599145316300101/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/VolumeManager.java#L84-L101],
>  we would list all volumes when setting the userName null.
> But now it throws OMException by underlying method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3143) Rename silently ignored tests

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-3143:
---
Fix Version/s: 0.6.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Rename silently ignored tests
> -
>
> Key: HDDS-3143
> URL: https://issues.apache.org/jira/browse/HDDS-3143
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Surefire plugin is configured to run {{Test*}} classes, but there are two 
> test classes named {{*Test}}:
> {code}
> $ find */*/src/test/java -name '*Test.java' | xargs grep -l '@Test'
> hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/HddsServerUtilTest.java
> hadoop-ozone/insight/src/test/java/org/apache/hadoop/ozone/insight/LogSubcommandTest.java
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on issue #607: HDDS-3002. NFS mountd support for Ozone

2020-03-10 Thread GitBox
elek commented on issue #607: HDDS-3002. NFS mountd support for Ozone
URL: https://github.com/apache/hadoop-ozone/pull/607#issuecomment-597047992
 
 
   /pending "I will post the design doc..."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on issue #399: HDDS-2424. Add the recover-trash command server side handling.

2020-03-10 Thread GitBox
elek commented on issue #399: HDDS-2424. Add the recover-trash command server 
side handling.
URL: https://github.com/apache/hadoop-ozone/pull/399#issuecomment-597047650
 
 
   /pending Comments from @bharatviswa504 are not addressed, yet...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on issue #618: HDDS-2911. Fix lastUsed and stateEnterTime value in container info is not human friendly

2020-03-10 Thread GitBox
elek commented on issue #618: HDDS-2911. Fix lastUsed and stateEnterTime value 
in container info is not human friendly
URL: https://github.com/apache/hadoop-ozone/pull/618#issuecomment-597047183
 
 
   > In this case, can we display it as string in CLI ouput, and keep the long 
value internally.
   
   +1 It seems to be to more flexible option. Keep the long value for protobuf 
(we can keep even the nanosec) but print out in a human readable form...
   
   We can also change the Java type (and not the protobuf type) to a proper 
java8 Time object (like `Instant`). That is more meaningful and might be 
printed out properly by the current default JSON serializer...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3142) Create isolated enviornment for OM to test it without SCM

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3142:
-
Labels: pull-request-available  (was: )

> Create isolated enviornment for OM to test it without SCM
> -
>
> Key: HDDS-3142
> URL: https://issues.apache.org/jira/browse/HDDS-3142
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> OmKeyGenerator class from Freon can generate keys (open key + commit key). 
> But this test tests both OM and SCM performance. It seems to be useful to 
> have a method to test only the OM performance with faking the response from 
> SCM.  
> Can be done easily with the same approach what we have in HDDS-3023: A simple 
> utility class can be implemented and with byteman we can replace the client 
> calls with the fake method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek opened a new pull request #656: HDDS-3142. Create isolated enviornment for OM to test it without SCM.

2020-03-10 Thread GitBox
elek opened a new pull request #656: HDDS-3142. Create isolated enviornment for 
OM to test it without SCM.
URL: https://github.com/apache/hadoop-ozone/pull/656
 
 
   ## What changes were proposed in this pull request?
   
   `OmKeyGenerator` class from Freon can generate keys (open key + commit key). 
But this test tests both OM and SCM performance. It seems to be useful to have 
a method to test only the OM performance with faking the response from SCM.
   
   Can be done easily with the same approach what we have in HDDS-3023: A 
simple utility class can be implemented and with byteman we can replace the 
client calls with the fake method.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3142
   
   ## How was this patch tested?
   
1. Download [byteman](https://byteman.jboss.org/)
2. Start a pure OM (`ozone om --init` + `ozone om`) with the following JVM 
parameters: (change the path) 
   
   ```
   
-javaagent:/home/elek/prog/byteman/lib/byteman.jar=script:/home/elek/projects/ozone/dev-support/byteman/mock-scm.btm,boot:/home/elek/prog/byteman/lib/byteman.jar
   -Dorg.jboss.byteman.transform.all
   ``` 
   
3. Start a simple freon test: `ozone freon omkg`
   
   Expected result: It should be possible to init and start OM without SCM and 
test it with the key generator)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai merged pull request #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null

2020-03-10 Thread GitBox
adoroszlai merged pull request #261: HDDS-2610. Fix the ObjectStore#listVolumes 
failure when argument is null
URL: https://github.com/apache/hadoop-ozone/pull/261
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null

2020-03-10 Thread GitBox
adoroszlai commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes 
failure when argument is null
URL: https://github.com/apache/hadoop-ozone/pull/261#issuecomment-597012687
 
 
   Filed [HDDS-3151](https://issues.apache.org/jira/browse/HDDS-3151) for the 
integration test failure, which we've seen earlier without this change, too.
   
   Acceptance test failure that SCM does not come out of safe mode is also 
observed elsewhere.
   
   Given we have a clean run on the PR source branch, and it's only 2 commits 
behind master, I think it's safe to merge.
   
   Thanks @cxorm for the contribution, @bharatviswa504 and @arp7 for the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3151) Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3

2020-03-10 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-3151:
---
Attachment: 
org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.txt

org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient-output.txt

> Intermittent timeout in 
> TestCloseContainerHandlingByClient#testMultiBlockWrites3
> 
>
> Key: HDDS-3151
> URL: https://issues.apache.org/jira/browse/HDDS-3151
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Priority: Major
> Attachments: 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient-output.txt,
>  org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.txt
>
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/495906854}
> Tests run: 8, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 210.963 s <<< 
> FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
> testMultiBlockWrites3(org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient)
>   Time elapsed: 108.777 s  <<< ERROR!
> java.util.concurrent.TimeoutException:
> ...
>   at 
> org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:251)
>   at 
> org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:151)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.waitForContainerClose(TestCloseContainerHandlingByClient.java:342)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testMultiBlockWrites3(TestCloseContainerHandlingByClient.java:310)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3151) Intermittent timeout in TestCloseContainerHandlingByClient#testMultiBlockWrites3

2020-03-10 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-3151:
--

 Summary: Intermittent timeout in 
TestCloseContainerHandlingByClient#testMultiBlockWrites3
 Key: HDDS-3151
 URL: https://issues.apache.org/jira/browse/HDDS-3151
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Attila Doroszlai


{code:title=https://github.com/apache/hadoop-ozone/runs/495906854}
Tests run: 8, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 210.963 s <<< 
FAILURE! - in 
org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
testMultiBlockWrites3(org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient)
  Time elapsed: 108.777 s  <<< ERROR!
java.util.concurrent.TimeoutException:
...
  at 
org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:251)
  at 
org.apache.hadoop.ozone.container.TestHelper.waitForContainerClose(TestHelper.java:151)
  at 
org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.waitForContainerClose(TestCloseContainerHandlingByClient.java:342)
  at 
org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient.testMultiBlockWrites3(TestCloseContainerHandlingByClient.java:310)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] cxorm commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes failure when argument is null

2020-03-10 Thread GitBox
cxorm commented on issue #261: HDDS-2610. Fix the ObjectStore#listVolumes 
failure when argument is null
URL: https://github.com/apache/hadoop-ozone/pull/261#issuecomment-596985532
 
 
   Thank you @arp7 for looking this PR.
   
   The [error 
check](https://github.com/apache/hadoop-ozone/pull/261/checks?check_run_id=495906854)
 is not related to the patch.
   Here is the [same 
branch](https://github.com/cxorm/hadoop-ozone/runs/495875746) passed all checks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3146) Intermittent timeout in TestOzoneRpcClient

2020-03-10 Thread Attila Doroszlai (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055677#comment-17055677
 ] 

Attila Doroszlai commented on HDDS-3146:


https://github.com/apache/hadoop-ozone/runs/496450696

> Intermittent timeout in TestOzoneRpcClient
> --
>
> Key: HDDS-3146
> URL: https://issues.apache.org/jira/browse/HDDS-3146
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Priority: Major
> Attachments: 
> org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient-output.txt
>
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/495197228}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) 
> on project hadoop-ozone-integration-test: There was a timeout or other error 
> in the fork
> ...
> org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org