[jira] [Updated] (HDDS-695) Introduce new SCM Commands to list and close Pipelines

2018-10-23 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-695:
-
Summary: Introduce new SCM Commands to list and close Pipelines  (was: 
Introduce a new SCM Command to teardown a Pipeline)

> Introduce new SCM Commands to list and close Pipelines
> --
>
> Key: HDDS-695
> URL: https://issues.apache.org/jira/browse/HDDS-695
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Blocker
> Attachments: HDDS-695-ozone-0.3.000.patch, 
> HDDS-695-ozone-0.3.001.patch, HDDS-695-ozone-0.3.002.patch, 
> HDDS-695-ozone-0.3.003.patch
>
>
> We need to have a tear-down pipeline command in SCM so that an administrator 
> can close/destroy a pipeline in the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-718) Introduce new SCM Commands to list and close Pipelines

2018-10-23 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-718:
-
Target Version/s: 0.4.0  (was: 0.3.0)

> Introduce new SCM Commands to list and close Pipelines
> --
>
> Key: HDDS-718
> URL: https://issues.apache.org/jira/browse/HDDS-718
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Blocker
>
> We need to have a tear-down pipeline command in SCM so that an administrator 
> can close/destroy a pipeline in the cluster.
> HDDS-695 brings in the commands in branch ozone-0.3, this Jira is for porting 
> them to trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-723) CloseContainerCommandHandler throwing NullPointerException

2018-10-24 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar reassigned HDDS-723:


Assignee: Nanda kumar

> CloseContainerCommandHandler throwing NullPointerException
> --
>
> Key: HDDS-723
> URL: https://issues.apache.org/jira/browse/HDDS-723
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Major
> Attachments: all-node-ozone-logs-1540356965.tar.gz
>
>
> Seeing NullPointerException error while CloseContainerCommandHandler is 
> trying to close container.
>  
>  
> {noformat}
> 2018-10-24 04:22:04,699 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 8a61160b-8985-412e-9f25-9e65ceafa824-RaftLogWorker got closed and hit 
> exception
> java.io.IOException: java.lang.InterruptedException
>  at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:51)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker.flushWrites(RaftLogWorker.java:232)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker.access$600(RaftLogWorker.java:51)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:309)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:179)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
>  at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker.flushWrites(RaftLogWorker.java:230)
>  ... 4 more
> 2018-10-24 04:22:04,712 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> 8a61160b-8985-412e-9f25-9e65ceafa824-RaftLogWorker close()
> 2018-10-24 04:22:31,293 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler:
>  Can't close container 18
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:78)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:381)
>  at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 04:22:31,293 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler:
>  Can't close container 10
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:78)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:381)
>  at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 04:22:31,293 ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler:
>  Can't close container 14
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CloseContainerCommandHandler.handle(CloseContainerCommandHandler.java:78)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:381)
>  at java.lang.Thread.run(Thread.java:745){noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-692) Use the ProgressBar class in the RandomKeyGenerator freon test

2018-10-23 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661114#comment-16661114
 ] 

Nanda kumar commented on HDDS-692:
--

[~horzsolt2006], thanks for working on this.

It is not a good idea to give the actual task to {{ProgressBar}} thread.
The way it should be is
 * Instantiate the ProgressBar class with {{PrintStream}}, {{MaxValue}} of type 
Long and {{Supplier}} function.
 * ProgressBar#start; this should start the ProgressBar thread
 * ProgressBar#shutdown; this should stop the ProgressBar thread

Apart from {{shutdown}} method which waits for the progress bar to complete, we 
should also have {{terminate}} method which can be used in case of exception in 
the actual job. Upon calling {{terminate}} method, {{ProgressBar}} thread 
should immediately terminate.

> Use the ProgressBar class in the RandomKeyGenerator freon test
> --
>
> Key: HDDS-692
> URL: https://issues.apache.org/jira/browse/HDDS-692
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Zsolt Horvath
>Priority: Major
> Attachments: HDDS-692.001.patch
>
>
> HDDS-443 provides a reusable progress bar to make it easier to add more freon 
> tests, but the existing RandomKeyGenerator test 
> (hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/RandomKeyGenerator.java)
>  still doesn't use it. 
> It would be good to switch to use the new progress bar there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-694) Plugin new Pipeline management code in SCM

2018-10-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-694:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

> Plugin new Pipeline management code in SCM
> --
>
> Key: HDDS-694
> URL: https://issues.apache.org/jira/browse/HDDS-694
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-694.001.patch, HDDS-694.002.patch, 
> HDDS-694.003.patch
>
>
> This Jira aims to plugin new pipeline management code in SCM. It removes the 
> old pipeline related classes as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-694) Plugin new Pipeline management code in SCM

2018-10-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665135#comment-16665135
 ] 

Nanda kumar commented on HDDS-694:
--

[~ljain], thanks for the contribution and thanks to [~anu] for review. 
Committed this to trunk.

> Plugin new Pipeline management code in SCM
> --
>
> Key: HDDS-694
> URL: https://issues.apache.org/jira/browse/HDDS-694
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-694.001.patch, HDDS-694.002.patch, 
> HDDS-694.003.patch
>
>
> This Jira aims to plugin new pipeline management code in SCM. It removes the 
> old pipeline related classes as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-692) Use the ProgressBar class in the RandomKeyGenerator freon test

2018-10-28 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1520#comment-1520
 ] 

Nanda kumar commented on HDDS-692:
--

[~horzsolt2006], sorry for the delay in response.

 
{quote}I'm not sure if I understand you wrt giving the actual task to the 
Progressbar thread.
{quote}
RandomKeyGenerator Line:253 - {{progressBar.start(task)}}, here {{task}} is the 
actual runnable which starts/submits the job to ExecutorService. We are passing 
the actual job to ProgressBar.
In case of {{RandomKeyGenerator}}, we use ExecutorService to run the tasks in 
parallel. If someone uses ProgressBar who doesn't use ExecutorService, 
ProgressBar will be the one who will be running the job.

 
{quote}In its public void start(Runnable task) the task parameter used as a 
functional interface, it doesn't actually start a thread..
{quote}
Actually, {{public void start(Runnable task)}} method is the one which runs the 
job. It doesn't create a new Thread to run, but runs the job in the same Thread.
{code:java}
  public void start(Runnable task) {

startTime = System.nanoTime();

try {

  progressBar.start();
  task.run();   -> This will run the job.

} catch (Exception e) {
  exception = true;
}
  }
{code}
 

We should not pass {{Runnable}} as an argument to {{ProgressBar}} class. 
ProgressBar should take a {{Supplier}} which will return a Long value.

This is how ProgressBar APIs should look.
{code:java}
public class ProgressBar {

  /**
   * Constructs the ProgressBar instance.
   *
   * @param stream The stream to print
   * @param maxValue The max value
   * @param currentValue current value supplier
   */
  public ProgressBar(PrintStream stream, Long maxValue, Supplier 
currentValue) {
...
// Create new progress bar task (runnable)
...
  }

  /**
   * Starts the ProgressBar in a new Thread.
   * This is a non blocking call.
   */
  public void start() {
...
// Start the progress bar task
...
  }

  /**
   * Graceful shutdown, waits for the progress bar to complete.
   * This is a blocking call.
   */
  public void shutdown() {
...
// Wait for the progress bar task to complete
...
  }

  /**
   * Terminates the progress bar.
   * This doesn't wait for the progress bar to complete.
   */
  public void terminate() {
...
// Terminate the progress bar task
...
  }
}
{code}
{quote}Sorry for my newbie questions, I'm just getting familiar with the code 
now.
{quote}
No issues :) If I have confused you more, we can get on a call to discuss this.

> Use the ProgressBar class in the RandomKeyGenerator freon test
> --
>
> Key: HDDS-692
> URL: https://issues.apache.org/jira/browse/HDDS-692
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Zsolt Horvath
>Priority: Major
> Attachments: HDDS-692.001.patch
>
>
> HDDS-443 provides a reusable progress bar to make it easier to add more freon 
> tests, but the existing RandomKeyGenerator test 
> (hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/RandomKeyGenerator.java)
>  still doesn't use it. 
> It would be good to switch to use the new progress bar there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-754) VolumeInfo#getScmUsed throws NPE

2018-10-29 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667652#comment-16667652
 ] 

Nanda kumar commented on HDDS-754:
--

This looks similar to HDDS-354.

> VolumeInfo#getScmUsed throws NPE
> 
>
> Key: HDDS-754
> URL: https://issues.apache.org/jira/browse/HDDS-754
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Mukul Kumar Singh
>Priority: Blocker
>
> The failure can be seen at the following jenkins run
> https://builds.apache.org/job/PreCommit-HDDS-Build/1540/testReport/org.apache.hadoop.hdds.scm.pipeline/TestNodeFailure/testPipelineFail/
> {code}
> 2018-10-29 13:44:11,984 WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(50)) - Execution exception 
> when running task in Datanode ReportManager Thread - 3
> 2018-10-29 13:44:11,984 WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in 
> thread Datanode ReportManager Thread - 3: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.container.common.volume.VolumeInfo.getScmUsed(VolumeInfo.java:107)
>   at 
> org.apache.hadoop.ozone.container.common.volume.VolumeSet.getNodeReport(VolumeSet.java:379)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.getNodeReport(OzoneContainer.java:225)
>   at 
> org.apache.hadoop.ozone.container.common.report.NodeReportPublisher.getReport(NodeReportPublisher.java:64)
>   at 
> org.apache.hadoop.ozone.container.common.report.NodeReportPublisher.getReport(NodeReportPublisher.java:39)
>   at 
> org.apache.hadoop.ozone.container.common.report.ReportPublisher.publishReport(ReportPublisher.java:86)
>   at 
> org.apache.hadoop.ozone.container.common.report.ReportPublisher.run(ReportPublisher.java:73)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-29 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-755:


 Summary: ContainerInfo and ContainerReplica protobuf changes
 Key: HDDS-755
 URL: https://issues.apache.org/jira/browse/HDDS-755
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode, SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


We have different classes that maintain container related information, we can 
consolidate them so that it is easy to read the code.

Proposal:
In SCM: will be used in communication between SCM and Client, also used for 
storing in db
* ContainerInfoProto
* ContainerInfo
 
In Datanode: Used in communication between Datanode and SCM
* ContainerReplicaProto
* ContainerReplica
 
In Datanode: Used in communication between Datanode and Client
* ContainerDataProto
* ContainerData




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-29 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-755:
-
Status: Patch Available  (was: Open)

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-755.000.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-29 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-755:
-
Attachment: HDDS-755.000.patch

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-755.000.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-692) Use the ProgressBar class in the RandomKeyGenerator freon test

2018-10-28 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-692:
-
Component/s: Tools

> Use the ProgressBar class in the RandomKeyGenerator freon test
> --
>
> Key: HDDS-692
> URL: https://issues.apache.org/jira/browse/HDDS-692
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Tools
>Reporter: Elek, Marton
>Assignee: Zsolt Horvath
>Priority: Major
> Attachments: HDDS-692.001.patch
>
>
> HDDS-443 provides a reusable progress bar to make it easier to add more freon 
> tests, but the existing RandomKeyGenerator test 
> (hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/RandomKeyGenerator.java)
>  still doesn't use it. 
> It would be good to switch to use the new progress bar there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-775) Batch updates to container db to minimize number of updates.

2018-11-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671500#comment-16671500
 ] 

Nanda kumar commented on HDDS-775:
--

Thanks [~msingh] for the patch. +1, looks good to me.

[~linyiqun], thanks for the review. Please find my response below
bq. Looks like this change can also be used for trunk
In trunk the whole container report processing is getting refactored as part of 
HDDS-737. I will upload the patch over there shortly.

bq. writeBatch operation should under lock protection. And lock operation 
should be moved outside  loop.
I agree that this will make the code look cleaner, but having the writeBatch 
outside of lock will not cause any correctness issue.
* {{batch}} is a method variable, so there won't be any corruption here even 
when multiple threads are accessing.
* Since {{writeBatch}} is rocksdb operation, we can rely on it for correctness 
in batch write.

> Batch updates to container db to minimize number of updates.
> 
>
> Key: HDDS-775
> URL: https://issues.apache.org/jira/browse/HDDS-775
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-775-ozone-0.3.001.patch
>
>
> Currently while processing container reports, each report results in a put 
> operation to the db. This can be optimized by replacing put with a batch 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-775) Batch updates to container db to minimize number of updates.

2018-11-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671531#comment-16671531
 ] 

Nanda kumar commented on HDDS-775:
--

Thanks [~linyiqun] for the quick response. I will commit it shortly.

> Batch updates to container db to minimize number of updates.
> 
>
> Key: HDDS-775
> URL: https://issues.apache.org/jira/browse/HDDS-775
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-775-ozone-0.3.001.patch
>
>
> Currently while processing container reports, each report results in a put 
> operation to the db. This can be optimized by replacing put with a batch 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-775) Batch updates to container db to minimize number of updates.

2018-11-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671551#comment-16671551
 ] 

Nanda kumar commented on HDDS-775:
--

Thanks [~msingh] for the contribution and [~linyiqun] for the review. Committed 
it to ozone-0.3 branch.

> Batch updates to container db to minimize number of updates.
> 
>
> Key: HDDS-775
> URL: https://issues.apache.org/jira/browse/HDDS-775
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-775-ozone-0.3.001.patch
>
>
> Currently while processing container reports, each report results in a put 
> operation to the db. This can be optimized by replacing put with a batch 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-775) Batch updates to container db to minimize number of updates.

2018-11-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671533#comment-16671533
 ] 

Nanda kumar commented on HDDS-775:
--

Findbug warning is not related to this patch, asflicense warnings are fixed in 
HDDS-777. I will fix the checkstyle issue while committing.

> Batch updates to container db to minimize number of updates.
> 
>
> Key: HDDS-775
> URL: https://issues.apache.org/jira/browse/HDDS-775
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-775-ozone-0.3.001.patch
>
>
> Currently while processing container reports, each report results in a put 
> operation to the db. This can be optimized by replacing put with a batch 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-775) Batch updates to container db to minimize number of updates.

2018-11-01 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-775:
-
   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

> Batch updates to container db to minimize number of updates.
> 
>
> Key: HDDS-775
> URL: https://issues.apache.org/jira/browse/HDDS-775
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-775-ozone-0.3.001.patch
>
>
> Currently while processing container reports, each report results in a put 
> operation to the db. This can be optimized by replacing put with a batch 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-30 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-755:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-755.000.patch, HDDS-755.001.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-30 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669611#comment-16669611
 ] 

Nanda kumar commented on HDDS-755:
--

Thanks [~linyiqun] and [~msingh] for the review. I have committed this to trunk.

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-755.000.patch, HDDS-755.001.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-30 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669606#comment-16669606
 ] 

Nanda kumar commented on HDDS-755:
--

[~linyiqun], will take care of the checkstyle issues while committing.

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-755.000.patch, HDDS-755.001.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-30 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-755:
-
Attachment: HDDS-755.001.patch

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-755.000.patch, HDDS-755.001.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-30 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668571#comment-16668571
 ] 

Nanda kumar commented on HDDS-755:
--

[~linyiqun], thanks for the review.

 
{quote}Compared with original logic, we introduce the new state QUASI_CLOSED, 
is intended change?
{quote}
Yes, QUASI_CLOSED state will be used when there is no pipeline and we want to 
close the container. I have planned to file follow-up jiras which will use this 
state.
{quote}Can we reuse State definition like before? And not define the same State 
both in ContainerReplicaProto and ContainerDataProto.
{quote}
The reason for duplicating this is because protobuf doesn't allow the same 
constant to be used across different enums in same proto file. We already have 
{{OPEN}}, {{CLOSING}}, {{CLOSED}}, etc in {{LifeCycleState}}, so we cannot have 
another enum in Hdds.proto which has these values.

There is also plan to simplify the Container and Pipeline states in SCM. This 
will bring changes in {{LifeCycleState}} and {{LifeCycleEvent}} enums in 
{{Hdds.proto}}. HDDS-735 and follow-up jiras will bring those changes.

 
{quote}I mean we won't throw error for the case of default case after this 
change. Maybe we should add the state check.
{quote}
Actually if there is no corresponding value in enum, while calling {{valueOf}} 
we will get {{java.lang.IllegalArgumentException: No enum constant ...}}. I 
agree that the exception with a custom message will make more sense. Changed it 
to the older format.

 

Also fixed related test failures in patch v001.

> ContainerInfo and ContainerReplica protobuf changes
> ---
>
> Key: HDDS-755
> URL: https://issues.apache.org/jira/browse/HDDS-755
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-755.000.patch, HDDS-755.001.patch
>
>
> We have different classes that maintain container related information, we can 
> consolidate them so that it is easy to read the code.
> Proposal:
> In SCM: will be used in communication between SCM and Client, also used for 
> storing in db
> * ContainerInfoProto
> * ContainerInfo
>  
> In Datanode: Used in communication between Datanode and SCM
> * ContainerReplicaProto
> * ContainerReplica
>  
> In Datanode: Used in communication between Datanode and Client
> * ContainerDataProto
> * ContainerData



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-762) Fix unit test failure for TestContainerSQLCli & TestCSMMetrics

2018-10-30 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668588#comment-16668588
 ] 

Nanda kumar commented on HDDS-762:
--

Thanks for the patch [~msingh]. +1, pending Jenkins.

> Fix unit test failure for TestContainerSQLCli & TestCSMMetrics
> --
>
> Key: HDDS-762
> URL: https://issues.apache.org/jira/browse/HDDS-762
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-762.001.patch
>
>
> TestContainerSQLCli & TestCSMMetrics are currently failing consistently 
> because of a mismatch in metrics register name. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-694) Plugin new Pipeline management code in SCM

2018-10-25 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663778#comment-16663778
 ] 

Nanda kumar commented on HDDS-694:
--

[~ljain], thanks for updating the patch. +1, pending Jenkins.

> Plugin new Pipeline management code in SCM
> --
>
> Key: HDDS-694
> URL: https://issues.apache.org/jira/browse/HDDS-694
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-694.001.patch, HDDS-694.002.patch, 
> HDDS-694.003.patch
>
>
> This Jira aims to plugin new pipeline management code in SCM. It removes the 
> old pipeline related classes as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-737) Introduce Incremental Container Report

2018-10-25 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-737:


 Summary: Introduce Incremental Container Report
 Key: HDDS-737
 URL: https://issues.apache.org/jira/browse/HDDS-737
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode, SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


We will use Incremental Container Report (ICR) to immediately inform SCM when 
there is some state change to the container in datanode. This will make sure 
that SCM is updated as soon as the state of a container changes and doesn’t 
have to wait for full container report.

*When do we send ICR?*
* When a container replica state changes from open/closing to closed
* When a container replica state changes from open/closing to quasi closed
* When a container replica state changes from quasi closed to closed
* When a container replica is deleted in datanode
* When a container replica is copied from another datanode
* When a container replica is discovered to be corrupted




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-738) Removing REST protocol support from OzoneClient

2018-10-25 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-738:


 Summary: Removing REST protocol support from OzoneClient
 Key: HDDS-738
 URL: https://issues.apache.org/jira/browse/HDDS-738
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Nanda kumar


Since we have functional {{S3Gateway}} for Ozone which works on REST protocol, 
having REST protocol support in OzoneClient feels redundant and it will take a 
lot of effort to maintain it up to date.
As S3Gateway is in a functional state now, I propose to remove REST protocol 
support from OzoneClient.

Once we remove REST support from OzoneClient, the following will be the 
interface to access Ozone cluster
 * OzoneClient (RPC Protocol)
 * OzoneFS (RPC Protocol)
 * S3Gateway (REST Protocol)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-728) Datanodes are going to dead state after some interval

2018-10-25 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663763#comment-16663763
 ] 

Nanda kumar commented on HDDS-728:
--

[~msingh], thanks for working on this. Overall the patch looks good to me, some 
minor comments

In XceiverServerRatis we don't need to maintain {{stateMachineMap}}, 
RaftServerProxy already has a map to maintain this and the entry from that map 
is removed whenever we do group remove.

In MiniOzoneClusterImpl, do we need this change? We can always wait for the 
datanode to get ready whenever we do a datanode restart.

> Datanodes are going to dead state after some interval
> -
>
> Key: HDDS-728
> URL: https://issues.apache.org/jira/browse/HDDS-728
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.3.0
>Reporter: Soumitra Sulav
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-728.001.patch, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-03.hwx.site.log, 
> hadoop-root-om-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-scm-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> om-audit-ctr-e138-1518143905142-541600-02-02.hwx.site.log
>
>
> Setup a 5 datanode ozone cluster with HDP on top of it.
> After restarting all HDP services few times encountered below issue which is 
> making the HDP services to fail.
> Same exception was observed in an old setup but I thought it could have been 
> issue with the setup but now encountered the same issue in new setup as well.
> {code:java}
> 2018-10-24 10:42:03,308 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:03,342 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> 

[jira] [Commented] (HDDS-744) Fix ASF license warning in PipelineNotFoundException class

2018-10-27 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1036#comment-1036
 ] 

Nanda kumar commented on HDDS-744:
--

Thank you for your contribution [~ljain]. Committed it to trunk.

> Fix ASF license warning in PipelineNotFoundException class
> --
>
> Key: HDDS-744
> URL: https://issues.apache.org/jira/browse/HDDS-744
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Trivial
> Attachments: HDDS-744.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-744) Fix ASF license warning in PipelineNotFoundException class

2018-10-27 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-744:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

> Fix ASF license warning in PipelineNotFoundException class
> --
>
> Key: HDDS-744
> URL: https://issues.apache.org/jira/browse/HDDS-744
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Trivial
> Fix For: 0.4.0
>
> Attachments: HDDS-744.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-692) Use the ProgressBar class in the RandomKeyGenerator freon test

2018-10-27 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665960#comment-16665960
 ] 

Nanda kumar commented on HDDS-692:
--

[~horzsolt2006], sorry for the delay. Got stuck with other tasks, I will try to 
respond to your doubts/questions by tomorrow. 

> Use the ProgressBar class in the RandomKeyGenerator freon test
> --
>
> Key: HDDS-692
> URL: https://issues.apache.org/jira/browse/HDDS-692
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Zsolt Horvath
>Priority: Major
> Attachments: HDDS-692.001.patch
>
>
> HDDS-443 provides a reusable progress bar to make it easier to add more freon 
> tests, but the existing RandomKeyGenerator test 
> (hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/RandomKeyGenerator.java)
>  still doesn't use it. 
> It would be good to switch to use the new progress bar there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-744) Fix ASF license warning in PipelineNotFoundException class

2018-10-27 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1035#comment-1035
 ] 

Nanda kumar commented on HDDS-744:
--

[~ljain], thanks for taking care of this. I will commit it shortly.

> Fix ASF license warning in PipelineNotFoundException class
> --
>
> Key: HDDS-744
> URL: https://issues.apache.org/jira/browse/HDDS-744
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Trivial
> Attachments: HDDS-744.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-801:


 Summary: Quasi close the container when close is not executed via 
Ratis
 Key: HDDS-801
 URL: https://issues.apache.org/jira/browse/HDDS-801
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.3.0
Reporter: Nanda kumar
Assignee: Nanda kumar


When datanode received CloseContainerCommand and the replication type is not 
RATIS, we should QUASI close the container. After quasi-closing the container 
an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-04 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674422#comment-16674422
 ] 

Nanda kumar commented on HDDS-737:
--

Thanks [~linyiqun] for taking a look at the patch.
{quote}could you give a summary note of the change
{quote}
Sure.
 * The main change in this patch is to introduce a way to send ICR immediately. 
This is done by introducing {{triggerHeartbeat}} method in 
{{DatanodeStateMachine}}, this will immediately trigger a heartbeat to SCM 
which will also include the reports which are ready to send. So, to send an ICR 
immediately what we have to do is
1) Add ICR (Container Report) to {{StateContext}}.
2) Call {{triggerHeartbeat}} method.

 
 * Since we have ICR in place, we don't need to send command status for 
{{CloseContainerCommand}}. (We eventually want to remove the command status for 
all the commands)
This patch removes the command status logic for {{CloseContainerCommand}} and 
also removes the command watcher (CloseContainerWatcher) for 
{{CloseContainerCommand}} in SCM.

 
 * Added IncrementalContainerReportHandler in SCM. (It is not complete yet, 
added TODO. Needs follow up jiras)

 
 * Processing of container report was previously done by {{NodeManager}} and 
{{ContainerManager}}. This logic is moved to {{ContainerReportHandler}} (There 
is a TODO which needs follow up jira)

 
 * Few more refactorings like removing Node2Container class and moving that 
data structure to {{NodeStateMap}}.

I wanted to cover all the scenarios in this jira itself, but the patch already 
got huge and it will become very difficult to review. I will start filing 
follow up jiras.

 
{quote}Current change has addressed all the points (When do we send ICR) 
mentioned in JIRA's description?
{quote}
Only one scenario is handled in this jira
 * When a container replica state changes from open/closing to closed

We still don't have code to QUASI_CLOSE a container in datanode, we should 
trigger ICR when we do that. (HDDS-801)
The same is for deleted, copied or corrupted.

I will start filing jiras so that we can keep track of it.

Thanks a lot for spending your time reviewing the patch.

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-04 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: HDDS-801.000.patch

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-117) Wrapper for set/get Standalone, Ratis and Rest Ports in DatanodeDetails.

2018-11-01 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672060#comment-16672060
 ] 

Nanda kumar commented on HDDS-117:
--

[~haridas124], I have made you a contributor to HDDS project. From now on you 
should be able to assign HDDS jiras to yourself.
Welcome to Ozone!

> Wrapper for set/get Standalone, Ratis and Rest Ports in DatanodeDetails.
> 
>
> Key: HDDS-117
> URL: https://issues.apache.org/jira/browse/HDDS-117
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Priority: Major
>  Labels: newbie
>
> It will be very helpful to have a wrapper for set/get Standalone, Ratis and 
> Rest Ports in DatanodeDetails.
> Search and Replace usage of DatanodeDetails#newPort directly in current code. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-117) Wrapper for set/get Standalone, Ratis and Rest Ports in DatanodeDetails.

2018-11-01 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar reassigned HDDS-117:


Assignee: Haridas Kandath

> Wrapper for set/get Standalone, Ratis and Rest Ports in DatanodeDetails.
> 
>
> Key: HDDS-117
> URL: https://issues.apache.org/jira/browse/HDDS-117
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Haridas Kandath
>Priority: Major
>  Labels: newbie
>
> It will be very helpful to have a wrapper for set/get Standalone, Ratis and 
> Rest Ports in DatanodeDetails.
> Search and Replace usage of DatanodeDetails#newPort directly in current code. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-738) Removing REST protocol support from OzoneClient

2018-10-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-738:
-
Target Version/s: 0.5.0  (was: 0.4.0)

> Removing REST protocol support from OzoneClient
> ---
>
> Key: HDDS-738
> URL: https://issues.apache.org/jira/browse/HDDS-738
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Nanda kumar
>Assignee: chencan
>Priority: Major
>
> Since we have functional {{S3Gateway}} for Ozone which works on REST 
> protocol, having REST protocol support in OzoneClient feels redundant and it 
> will take a lot of effort to maintain it up to date.
> As S3Gateway is in a functional state now, I propose to remove REST protocol 
> support from OzoneClient.
> Once we remove REST support from OzoneClient, the following will be the 
> interface to access Ozone cluster
>  * OzoneClient (RPC Protocol)
>  * OzoneFS (RPC Protocol)
>  * S3Gateway (REST Protocol)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-738) Removing REST protocol support from OzoneClient

2018-10-26 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar reassigned HDDS-738:


Assignee: (was: chencan)

> Removing REST protocol support from OzoneClient
> ---
>
> Key: HDDS-738
> URL: https://issues.apache.org/jira/browse/HDDS-738
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Nanda kumar
>Priority: Major
>
> Since we have functional {{S3Gateway}} for Ozone which works on REST 
> protocol, having REST protocol support in OzoneClient feels redundant and it 
> will take a lot of effort to maintain it up to date.
> As S3Gateway is in a functional state now, I propose to remove REST protocol 
> support from OzoneClient.
> Once we remove REST support from OzoneClient, the following will be the 
> interface to access Ozone cluster
>  * OzoneClient (RPC Protocol)
>  * OzoneFS (RPC Protocol)
>  * S3Gateway (REST Protocol)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-738) Removing REST protocol support from OzoneClient

2018-10-26 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664995#comment-16664995
 ] 

Nanda kumar commented on HDDS-738:
--

This jira is for discussion and it will act as an umbrella jira for removing 
REST protocol support from OzoneClient.

> Removing REST protocol support from OzoneClient
> ---
>
> Key: HDDS-738
> URL: https://issues.apache.org/jira/browse/HDDS-738
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Nanda kumar
>Assignee: chencan
>Priority: Major
>
> Since we have functional {{S3Gateway}} for Ozone which works on REST 
> protocol, having REST protocol support in OzoneClient feels redundant and it 
> will take a lot of effort to maintain it up to date.
> As S3Gateway is in a functional state now, I propose to remove REST protocol 
> support from OzoneClient.
> Once we remove REST support from OzoneClient, the following will be the 
> interface to access Ozone cluster
>  * OzoneClient (RPC Protocol)
>  * OzoneFS (RPC Protocol)
>  * S3Gateway (REST Protocol)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-618) Separate DN registration from Heartbeat

2018-10-25 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar reassigned HDDS-618:


Assignee: (was: Nanda kumar)

> Separate DN registration from Heartbeat
> ---
>
> Key: HDDS-618
> URL: https://issues.apache.org/jira/browse/HDDS-618
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Hanisha Koneru
>Priority: Major
>
> Currently, if SCM has to send ReRegister command to a DN, it can only do so 
> through heartbeat response. Due to this, DN reregistration can take upto 2 
> heartbeat intervals. 
> We should decouple registration requests from heartbeat, so that DN can 
> reregister as soon as SCM detects that the node is not registered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-728) Datanodes should use different ContainerStateMachine for each pipeline.

2018-10-29 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667095#comment-16667095
 ] 

Nanda kumar commented on HDDS-728:
--

Thanks [~msingh] for the patches. +1 on [^HDDS-728.012.patch] and 
[^HDDS-728-ozone-0.3.005.patch], pending Jenkins.

Tested them locally.

> Datanodes should use different ContainerStateMachine for each pipeline.
> ---
>
> Key: HDDS-728
> URL: https://issues.apache.org/jira/browse/HDDS-728
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.3.0
>Reporter: Soumitra Sulav
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-728-ozone-0.3.005.patch, HDDS-728.001.patch, 
> HDDS-728.002.patch, HDDS-728.003.patch, HDDS-728.004.patch, 
> HDDS-728.005.patch, HDDS-728.006.patch, HDDS-728.007.patch, 
> HDDS-728.008.patch, HDDS-728.009.patch, HDDS-728.010.patch, 
> HDDS-728.011.patch, HDDS-728.012.patch, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-03.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-08.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-09.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-10.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-04.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-05.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-06.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-07.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-08.hwx.site.log, 
> hadoop-root-om-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-scm-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> om-audit-ctr-e138-1518143905142-541600-02-02.hwx.site.log
>
>
> Setup a 5 datanode ozone cluster with HDP on top of it.
> After restarting all HDP services few times encountered below issue which is 
> making the HDP services to fail.
> Same exception was observed in an old setup but I thought it could have been 
> issue with the setup but now encountered the same issue in new setup as well.
> {code:java}
> 2018-10-24 10:42:03,308 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:03,342 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> 

[jira] [Updated] (HDDS-728) Datanodes should use different ContainerStateMachine for each pipeline.

2018-10-29 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-728:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   0.3.0
   Status: Resolved  (was: Patch Available)

> Datanodes should use different ContainerStateMachine for each pipeline.
> ---
>
> Key: HDDS-728
> URL: https://issues.apache.org/jira/browse/HDDS-728
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.3.0
>Reporter: Soumitra Sulav
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-728-ozone-0.3.005.patch, HDDS-728.001.patch, 
> HDDS-728.002.patch, HDDS-728.003.patch, HDDS-728.004.patch, 
> HDDS-728.005.patch, HDDS-728.006.patch, HDDS-728.007.patch, 
> HDDS-728.008.patch, HDDS-728.009.patch, HDDS-728.010.patch, 
> HDDS-728.011.patch, HDDS-728.012.patch, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-03.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-08.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-09.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-10.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-04.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-05.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-06.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-07.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-08.hwx.site.log, 
> hadoop-root-om-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-scm-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> om-audit-ctr-e138-1518143905142-541600-02-02.hwx.site.log
>
>
> Setup a 5 datanode ozone cluster with HDP on top of it.
> After restarting all HDP services few times encountered below issue which is 
> making the HDP services to fail.
> Same exception was observed in an old setup but I thought it could have been 
> issue with the setup but now encountered the same issue in new setup as well.
> {code:java}
> 2018-10-24 10:42:03,308 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:03,342 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> 

[jira] [Commented] (HDDS-728) Datanodes should use different ContainerStateMachine for each pipeline.

2018-10-29 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667295#comment-16667295
 ] 

Nanda kumar commented on HDDS-728:
--

[~msingh], thanks for the contribution. Thanks to [~ssulav] for reporting and 
testing it and thanks to [~shashikant] and [~anu] for the review.
I committed it to trunk and ozone-0.3 branch.

> Datanodes should use different ContainerStateMachine for each pipeline.
> ---
>
> Key: HDDS-728
> URL: https://issues.apache.org/jira/browse/HDDS-728
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.3.0
>Reporter: Soumitra Sulav
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-728-ozone-0.3.005.patch, HDDS-728.001.patch, 
> HDDS-728.002.patch, HDDS-728.003.patch, HDDS-728.004.patch, 
> HDDS-728.005.patch, HDDS-728.006.patch, HDDS-728.007.patch, 
> HDDS-728.008.patch, HDDS-728.009.patch, HDDS-728.010.patch, 
> HDDS-728.011.patch, HDDS-728.012.patch, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-03.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-08.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-09.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-541600-02-10.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-04.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-05.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-06.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-07.hwx.site.log, 
> hadoop-root-datanode-ctr-e138-1518143905142-552728-01-08.hwx.site.log, 
> hadoop-root-om-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> hadoop-root-scm-ctr-e138-1518143905142-541600-02-02.hwx.site.log, 
> om-audit-ctr-e138-1518143905142-541600-02-02.hwx.site.log
>
>
> Setup a 5 datanode ozone cluster with HDP on top of it.
> After restarting all HDP services few times encountered below issue which is 
> making the HDP services to fail.
> Same exception was observed in an old setup but I thought it could have been 
> issue with the setup but now encountered the same issue in new setup as well.
> {code:java}
> 2018-10-24 10:42:03,308 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at 
> org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:03,342 WARN 
> org.apache.ratis.grpc.server.GrpcServerProtocolService: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 
> 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 
> 2974da2b-e765-43f9-8d30-45fe40dcb9ab: group-CE87A994686F not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at 
> 

[jira] [Created] (HDDS-733) Create container if not exist, as part of chunk write

2018-10-25 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-733:


 Summary: Create container if not exist, as part of chunk write
 Key: HDDS-733
 URL: https://issues.apache.org/jira/browse/HDDS-733
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Nanda kumar


The current implementation requires a container to be created in datanode 
before starting the chunk write. This can be optimized by creating the 
container on the first chunk write.
During chunk write, if the container is missing, we can go ahead and create the 
container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-735) Remove ALLOCATED and CREATING state from ContainerStateManager

2018-10-25 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-735:


 Summary: Remove ALLOCATED and CREATING state from 
ContainerStateManager
 Key: HDDS-735
 URL: https://issues.apache.org/jira/browse/HDDS-735
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar


After HDDS-733 and HDDS-734, we don't need ALLOCATED and CREATING state for 
containers in SCM. The container will move to OPEN state as soon as it is 
allocated in SCM. Since the container creation happens as part of the first 
chunk write and container creation operation in datanode idempotent we don't 
have to worry about giving out the same container to multiple clients as soon 
as it is allocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-694) Plugin new Pipeline management code in SCM

2018-10-25 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663442#comment-16663442
 ] 

Nanda kumar commented on HDDS-694:
--

[~ljain], the patch is not applying anymore. Can you rebase it on top of the 
latest changes?

Also can you please fix the checkstyle issues. A couple of very minor comments,

Pipeline.java
 Line:107 We can avoid creation of new ArrayList object by not calling 
{{getNodes}}.
 {{getNodes().get(0)}} can be replaced with 
{{nodeStatus.keySet().iterator().next()}}

RatisPipelineProvider.java
 Line:129 The state should be {{OPEN}}

> Plugin new Pipeline management code in SCM
> --
>
> Key: HDDS-694
> URL: https://issues.apache.org/jira/browse/HDDS-694
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-694.001.patch, HDDS-694.002.patch
>
>
> This Jira aims to plugin new pipeline management code in SCM. It removes the 
> old pipeline related classes as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-734) Remove create container logic from OzoneClient

2018-10-25 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-734:


 Summary: Remove create container logic from OzoneClient
 Key: HDDS-734
 URL: https://issues.apache.org/jira/browse/HDDS-734
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Nanda kumar


After HDDS-733, the container will be created as part of the first chunk write, 
we don't need explicit container creation code in {{OzoneClient}} anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-733) Create container if not exist, as part of chunk write

2018-11-05 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675599#comment-16675599
 ] 

Nanda kumar commented on HDDS-733:
--

[~ljain], thanks for working on this. The patch looks good to me, couple of 
minor comments

In SCMChillModeManager#ContainerChillModeRule we should exclude the containers 
which are in OPEN state from adding to containerMap. Now the containers in OPEN 
state might not be created in datanode.
ChillModeManager should track pipelines in the cluster for containers in OPEN 
state.

 

Unused imports in SCMContainerManager and TestDeadNodeHandler.

 

As [~linyiqun] pointed out, we can add a test case to make sure that we don't 
create containers for ReadChunk.
{quote}Send ReadChunk request before WriteChunk request and verify the 
StorageContainerException of CONTAINER_NOT_FOUND.
{quote}
 

Looks the following tests are failing with this patch, can you take a look?
* org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
* org.apache.hadoop.ozone.freon.TestFreonWithDatanodeRestart

> Create container if not exist, as part of chunk write
> -
>
> Key: HDDS-733
> URL: https://issues.apache.org/jira/browse/HDDS-733
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-733.001.patch, HDDS-733.002.patch
>
>
> The current implementation requires a container to be created in datanode 
> before starting the chunk write. This can be optimized by creating the 
> container on the first chunk write.
>  During chunk write, if the container is missing, we can go ahead and create 
> the container.
> Along with this change ALLOCATED and CREATING container states can be removed 
> as they were used to track which containers have been successfully created. 
> Also there is a shouldCreateContainer flag which is used by client to know if 
> it needs to create container. This flag can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-05 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675615#comment-16675615
 ] 

Nanda kumar commented on HDDS-737:
--

[~linyiqun], thanks for the review.

bq.  I prefer to add additional try-catch for thread.sleep and get 
InterruptedException.
Good idea. Will address it in the next patch.

bq. Can we have a new unit test for the incremental container report?
I have added {{TestCloseContainerCommandHandler}} in HDDS-801 which tests 
whether ICR is properly triggered.
HDDS-801 also introduces {{OzoneContainer#updateContainerState}} call which 
triggers ICR, there is a bit of restructuring of code in the way we trigger ICR 
in HDDS-801.

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-06 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677143#comment-16677143
 ] 

Nanda kumar commented on HDDS-737:
--

Looks like Jenkins run has some problem, the build failed with the below error
{code:java}
[ERROR] Please refer to 
/testptch/hadoop/hadoop-hdds/common/target/surefire-reports for the individual 
test results.
[ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, 
[date].dumpstream and [date]-jvmRun[N].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /testptch/hadoop/hadoop-hdds/common && 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx2048m 
-XX:+HeapDumpOnOutOfMemoryError -DminiClusterDedicatedDirs=true -jar 
/testptch/hadoop/hadoop-hdds/common/target/surefire/surefirebooter1364262303668308594.jar
 /testptch/hadoop/hadoop-hdds/common/target/surefire 
2018-11-06T11-45-30_771-jvmRun4 surefire2576372466595873598tmp 
surefire_22763712882772709533tmp
{code}
Re-triggered Jenkins pre-commit build:
 [https://builds.apache.org/job/PreCommit-HDDS-Build/1626/]

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-692) Use the ProgressBar class in the RandomKeyGenerator freon test

2018-11-06 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677190#comment-16677190
 ] 

Nanda kumar commented on HDDS-692:
--

[~horzsolt2006], the ProgressBar code can be refactored like below
{code:java}
/**
 * Creates and runs a ProgressBar in new Thread which gets printed on
 * the provided PrintStream.
 */
public class ProgressBar {

  private static final Logger LOG = LoggerFactory.getLogger(ProgressBar.class);
  private static final long REFRESH_INTERVAL = 1000L;

  private final long maxValue;
  private final Supplier currentValue;
  private final Thread progressBar;

  private volatile boolean running;

  private long startTime;

  /**
   * Creates a new ProgressBar instance which prints the progress on the given
   * PrintStream when started.
   *
   * @param stream to display the progress
   * @param maxValue Maximum value of the progress
   * @param currentValue Supplier that provides the current value
   */
  public ProgressBar(final PrintStream stream, final Long maxValue,
  final Supplier currentValue) {
this.maxValue = maxValue;
this.currentValue = currentValue;
this.progressBar = new Thread(getProgressBar(stream));
this.running = false;
  }

  /**
   * Starts the ProgressBar in a new Thread.
   * This is a non blocking call.
   */
  public synchronized void start() {
if (!running) {
  running = true;
  startTime = System.nanoTime();
  progressBar.start();
}
  }

  /**
   * Graceful shutdown, waits for the progress bar to complete.
   * This is a blocking call.
   */
  public synchronized void shutdown() {
if (running) {
  try {
progressBar.join();
running = false;
  } catch (InterruptedException e) {
LOG.warn("Got interrupted while waiting for the progress bar to " +
"complete.");
  }
}
  }

  /**
   * Terminates the progress bar. This doesn't wait for the progress bar
   * to complete.
   */
  public synchronized void terminate() {
if (running) {
  try {
running = false;
progressBar.join();
  } catch (InterruptedException e) {
LOG.warn("Got interrupted while waiting for the progress bar to " +
"complete.");
  }
}
  }

  private Runnable getProgressBar(final PrintStream stream) {
return () -> {
  stream.println();
  while (running && currentValue.get() < maxValue) {
print(stream, currentValue.get());
try {
  Thread.sleep(REFRESH_INTERVAL);
} catch (InterruptedException e) {
  LOG.warn("ProgressBar was interrupted.");
}
  }
  print(stream, maxValue);
  stream.println();
  running = false;
};
  }

  /**
   * Given current value prints the progress bar.
   *
   * @param value current progress position
   */
  private void print(final PrintStream stream, final long value) {
stream.print('\r');
double percent = 100.0 * value / maxValue;
StringBuilder sb = new StringBuilder();
sb.append(" " + String.format("%.2f", percent) + "% |");

for (int i = 0; i <= percent; i++) {
  sb.append('█');
}
for (int j = 0; j < 100 - percent; j++) {
  sb.append(' ');
}
sb.append("|  ");
sb.append(value + "/" + maxValue);
long timeInSec = TimeUnit.SECONDS.convert(
System.nanoTime() - startTime, TimeUnit.NANOSECONDS);
String timeToPrint = String.format("%d:%02d:%02d", timeInSec / 3600,
(timeInSec % 3600) / 60, timeInSec % 60);
sb.append(" Time: " + timeToPrint);
stream.print(sb.toString());
  }
}
{code}

> Use the ProgressBar class in the RandomKeyGenerator freon test
> --
>
> Key: HDDS-692
> URL: https://issues.apache.org/jira/browse/HDDS-692
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Tools
>Reporter: Elek, Marton
>Assignee: Zsolt Horvath
>Priority: Major
> Attachments: HDDS-692.001.patch, HDDS-692.002.patch, 
> HDDS-692.003.patch
>
>
> HDDS-443 provides a reusable progress bar to make it easier to add more freon 
> tests, but the existing RandomKeyGenerator test 
> (hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/RandomKeyGenerator.java)
>  still doesn't use it. 
> It would be good to switch to use the new progress bar there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679739#comment-16679739
 ] 

Nanda kumar commented on HDDS-737:
--

Thanks [~linyiqun] & [~jnp] for the reviews. Committed it to trunk.

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-812) TestEndPoint#testCheckVersionResponse is failing

2018-11-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679704#comment-16679704
 ] 

Nanda kumar commented on HDDS-812:
--

[~hanishakoneru], It seems you have accidentally attached HDDS-797's patch to 
this jira.

> TestEndPoint#testCheckVersionResponse is failing
> 
>
> Key: HDDS-812
> URL: https://issues.apache.org/jira/browse/HDDS-812
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDDS-797.001.patch
>
>
>  TestEndPoint#testCheckVersionResponse is failing with the below error
> {code:java}
> [ERROR] 
> testCheckVersionResponse(org.apache.hadoop.ozone.container.common.TestEndPoint)
>   Time elapsed: 0.142 s  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> {code}
> Once we are in REGISTER state we don't allow getVersion call anymore. This is 
> causing the test case to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-737) Introduce Incremental Container Report

2018-11-08 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-737:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-823) OzoneRestClient is failing with NPE on getKeyDetails call

2018-11-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679663#comment-16679663
 ] 

Nanda kumar commented on HDDS-823:
--

This is happening after HDDS-798. 

> OzoneRestClient is failing with NPE on getKeyDetails call
> -
>
> Key: HDDS-823
> URL: https://issues.apache.org/jira/browse/HDDS-823
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Priority: Blocker
>
> {{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which 
> is causing a lot of unit test and smoke test to fail.
> Exception trace:
> {code:java}
> Error while calling command 
> (org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): 
> java.lang.NullPointerException
>   at picocli.CommandLine.execute(CommandLine.java:926)
>   at picocli.CommandLine.access$700(CommandLine.java:104)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
>   at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
>   at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
>   at 
> org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259)
>   at 
> org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817)
>   at 
> org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282)
>   at 
> org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65)
>   at 
> org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37)
>   at picocli.CommandLine.execute(CommandLine.java:919)
>   ... 18 more
>   {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-798) Storage-class is showing incorrectly

2018-11-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679669#comment-16679669
 ] 

Nanda kumar commented on HDDS-798:
--

After this change {{RestClient#getKeyDetails}} is failing with 
{{NullPointerException}} which is causing few of unit test and smoke test to 
fail.

> Storage-class is showing incorrectly
> 
>
> Key: HDDS-798
> URL: https://issues.apache.org/jira/browse/HDDS-798
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-798.00.patch
>
>
> After HDDS-712, we support storage-class.
> For list-objects, even if key has set storage-class to REDUCED_REDUNDANCY, 
> still it shows STANDARD.
> As in code in list object response, we have hardcoded it as below.
> keyMetadata.setStorageClass("STANDARD");



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-08 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Status: Patch Available  (was: Open)

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-823) OzoneRestClient is failing with NPE on getKeyDetails call

2018-11-08 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-823:


 Summary: OzoneRestClient is failing with NPE on getKeyDetails call
 Key: HDDS-823
 URL: https://issues.apache.org/jira/browse/HDDS-823
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.3.0
Reporter: Nanda kumar


{{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which is 
causing a lot of unit test and smoke test to fail.
Exception trace:
{code:java}
Error while calling command 
(org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): 
java.lang.NullPointerException
at picocli.CommandLine.execute(CommandLine.java:926)
at picocli.CommandLine.access$700(CommandLine.java:104)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817)
at 
org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37)
at picocli.CommandLine.execute(CommandLine.java:919)
... 18 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679732#comment-16679732
 ] 

Nanda kumar commented on HDDS-737:
--

Tested it locally, failures are not related to this patch. I will commit it 
shortly.

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-823) OzoneRestClient is failing with NPE on getKeyDetails call

2018-11-08 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-823:
-
Target Version/s: 0.3.0, 0.4.0  (was: 0.3.0)

> OzoneRestClient is failing with NPE on getKeyDetails call
> -
>
> Key: HDDS-823
> URL: https://issues.apache.org/jira/browse/HDDS-823
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Priority: Blocker
>
> {{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which 
> is causing few of unit test and smoke test to fail.
> Exception trace:
> {code:java}
> Error while calling command 
> (org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): 
> java.lang.NullPointerException
>   at picocli.CommandLine.execute(CommandLine.java:926)
>   at picocli.CommandLine.access$700(CommandLine.java:104)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
>   at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
>   at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
>   at 
> org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259)
>   at 
> org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817)
>   at 
> org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282)
>   at 
> org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65)
>   at 
> org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37)
>   at picocli.CommandLine.execute(CommandLine.java:919)
>   ... 18 more
>   {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-823) OzoneRestClient is failing with NPE on getKeyDetails call

2018-11-08 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-823:
-
Description: 
{{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which is 
causing few of unit test and smoke test to fail.
Exception trace:
{code:java}
Error while calling command 
(org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): 
java.lang.NullPointerException
at picocli.CommandLine.execute(CommandLine.java:926)
at picocli.CommandLine.access$700(CommandLine.java:104)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817)
at 
org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37)
at picocli.CommandLine.execute(CommandLine.java:919)
... 18 more
{code}

  was:
{{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which is 
causing a lot of unit test and smoke test to fail.
Exception trace:
{code:java}
Error while calling command 
(org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): 
java.lang.NullPointerException
at picocli.CommandLine.execute(CommandLine.java:926)
at picocli.CommandLine.access$700(CommandLine.java:104)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817)
at 
org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37)
at picocli.CommandLine.execute(CommandLine.java:919)
 

[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Status: Open  (was: Patch Available)

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-801 started by Nanda kumar.

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-576) Move ContainerWithPipeline creation to RPC endpoint

2018-11-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-576:
-
Description: With independent Pipeline and Container Managers in SCM, the 
creation of ContainerWithPipeline can be moved to RPC endpoint. This will 
ensure clear separation of the pipeline Manager and Container Manager  (was: 
with independent Pipeline and Container Managers in SCM, the creation of 
ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
separation of the pipeline Manager and Container Manager)

> Move ContainerWithPipeline creation to RPC endpoint
> ---
>
> Key: HDDS-576
> URL: https://issues.apache.org/jira/browse/HDDS-576
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> With independent Pipeline and Container Managers in SCM, the creation of 
> ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
> separation of the pipeline Manager and Container Manager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-576) Move ContainerWithPipeline creation to RPC endpoint

2018-11-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar reassigned HDDS-576:


Assignee: Nanda kumar

> Move ContainerWithPipeline creation to RPC endpoint
> ---
>
> Key: HDDS-576
> URL: https://issues.apache.org/jira/browse/HDDS-576
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Nanda kumar
>Priority: Major
>
> With independent Pipeline and Container Managers in SCM, the creation of 
> ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
> separation of the pipeline Manager and Container Manager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-827) TestStorageContainerManagerHttpServer should use dynamic port

2018-11-11 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-827:


 Summary: TestStorageContainerManagerHttpServer should use dynamic 
port
 Key: HDDS-827
 URL: https://issues.apache.org/jira/browse/HDDS-827
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Reporter: Nanda kumar


Most of the time {{TestStorageContainerManagerHttpServer}} is failing with 
{code}
java.net.BindException: Port in use: 0.0.0.0:9876
...
Caused by: java.net.BindException: Address already in use
{code}

TestStorageContainerManagerHttpServer should use a port which is free 
(dynamic), instead of trying to bind with default 9876.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-827) TestStorageContainerManagerHttpServer should use dynamic port

2018-11-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-827:
-
Labels: newbie  (was: )

> TestStorageContainerManagerHttpServer should use dynamic port
> -
>
> Key: HDDS-827
> URL: https://issues.apache.org/jira/browse/HDDS-827
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Nanda kumar
>Priority: Major
>  Labels: newbie
>
> Most of the time {{TestStorageContainerManagerHttpServer}} is failing with 
> {code}
> java.net.BindException: Port in use: 0.0.0.0:9876
> ...
> Caused by: java.net.BindException: Address already in use
> {code}
> TestStorageContainerManagerHttpServer should use a port which is free 
> (dynamic), instead of trying to bind with default 9876.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-576) Move ContainerWithPipeline creation to RPC endpoint

2018-11-13 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685072#comment-16685072
 ] 

Nanda kumar commented on HDDS-576:
--

[~linyiqun], thanks for the review. Created HDDS-833 for updating the javadoc.

> Move ContainerWithPipeline creation to RPC endpoint
> ---
>
> Key: HDDS-576
> URL: https://issues.apache.org/jira/browse/HDDS-576
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-576.000.patch
>
>
> With independent Pipeline and Container Managers in SCM, the creation of 
> ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
> separation of the pipeline Manager and Container Manager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-833) Update javadoc in StorageContainerManager, NodeManager, PipelineManager and ContainerManager

2018-11-13 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-833:


 Summary: Update javadoc in StorageContainerManager, NodeManager, 
PipelineManager and ContainerManager
 Key: HDDS-833
 URL: https://issues.apache.org/jira/browse/HDDS-833
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


The javadoc in following interface/classes has to be updated
* StorageContainerManager
* NodeManager
* NodeStateManager
* PipelineManager
* PipelineStateManager
* ContainerManager
* ContainerStateManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-830) Datanode should not start XceiverServerRatis before getting version information from SCM

2018-11-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-830:


 Summary: Datanode should not start XceiverServerRatis before 
getting version information from SCM
 Key: HDDS-830
 URL: https://issues.apache.org/jira/browse/HDDS-830
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.3.0
Reporter: Nanda kumar


If a datanode restarts quickly before SCM detects, it will rejoin the ratis 
ring (existing pipeline). Since SCM didn't detect this restart, the pipeline is 
not closed. Now there is a time gap after the datanode is started and it got 
the version information from SCM. During this time, the SCM ID in datanode is 
not set(null). If a client tries to use this pipeline during that time, the 
container state machine will throw {{java.lang.NullPointerException: scmId 
cannot be nul}}. This will cause {{RaftLogWorker}} to terminate resulting in 
datanode crash.

{code}
2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker 
(ExitUtils.java:terminate(86)) - Terminating with exit status 1: 
407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed.
java.io.IOException: java.lang.NullPointerException: scmId cannot be null
  at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
  at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
  at 
org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
  at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344)
  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: scmId cannot be null
  at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301)
  at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  ... 1 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-576) Move ContainerWithPipeline creation to RPC endpoint

2018-11-12 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-576:
-
Status: Patch Available  (was: Open)

> Move ContainerWithPipeline creation to RPC endpoint
> ---
>
> Key: HDDS-576
> URL: https://issues.apache.org/jira/browse/HDDS-576
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-576.000.patch
>
>
> With independent Pipeline and Container Managers in SCM, the creation of 
> ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
> separation of the pipeline Manager and Container Manager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-576) Move ContainerWithPipeline creation to RPC endpoint

2018-11-12 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683928#comment-16683928
 ] 

Nanda kumar commented on HDDS-576:
--

This patch also fixes the test failures.

> Move ContainerWithPipeline creation to RPC endpoint
> ---
>
> Key: HDDS-576
> URL: https://issues.apache.org/jira/browse/HDDS-576
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-576.000.patch
>
>
> With independent Pipeline and Container Managers in SCM, the creation of 
> ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
> separation of the pipeline Manager and Container Manager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-830) Datanode should not start XceiverServerRatis before getting version information from SCM

2018-11-12 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-830:
-
Issue Type: Bug  (was: Improvement)

> Datanode should not start XceiverServerRatis before getting version 
> information from SCM
> 
>
> Key: HDDS-830
> URL: https://issues.apache.org/jira/browse/HDDS-830
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Priority: Major
>
> If a datanode restarts quickly before SCM detects, it will rejoin the ratis 
> ring (existing pipeline). Since SCM didn't detect this restart, the pipeline 
> is not closed. Now there is a time gap after the datanode is started and it 
> got the version information from SCM. During this time, the SCM ID in 
> datanode is not set(null). If a client tries to use this pipeline during that 
> time, the container state machine will throw 
> {{java.lang.NullPointerException: scmId cannot be nul}}. This will cause 
> {{RaftLogWorker}} to terminate resulting in datanode crash.
> {code}
> 2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker 
> (ExitUtils.java:terminate(86)) - Terminating with exit status 1: 
> 407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed.
> java.io.IOException: java.lang.NullPointerException: scmId cannot be null
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344)
>   at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: scmId cannot be null
>   at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-576) Move ContainerWithPipeline creation to RPC endpoint

2018-11-12 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-576:
-
Attachment: HDDS-576.000.patch

> Move ContainerWithPipeline creation to RPC endpoint
> ---
>
> Key: HDDS-576
> URL: https://issues.apache.org/jira/browse/HDDS-576
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-576.000.patch
>
>
> With independent Pipeline and Container Managers in SCM, the creation of 
> ContainerWithPipeline can be moved to RPC endpoint. This will ensure clear 
> separation of the pipeline Manager and Container Manager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-827) TestStorageContainerManagerHttpServer should use dynamic port

2018-11-13 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685347#comment-16685347
 ] 

Nanda kumar commented on HDDS-827:
--

+1, will commit this shortly.

> TestStorageContainerManagerHttpServer should use dynamic port
> -
>
> Key: HDDS-827
> URL: https://issues.apache.org/jira/browse/HDDS-827
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Nanda kumar
>Assignee: Sandeep Nemuri
>Priority: Major
>  Labels: newbie
> Attachments: HDDS-827.001.patch
>
>
> Most of the time {{TestStorageContainerManagerHttpServer}} is failing with 
> {code}
> java.net.BindException: Port in use: 0.0.0.0:9876
> ...
> Caused by: java.net.BindException: Address already in use
> {code}
> TestStorageContainerManagerHttpServer should use a port which is free 
> (dynamic), instead of trying to bind with default 9876.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-827) TestStorageContainerManagerHttpServer should use dynamic port

2018-11-15 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688049#comment-16688049
 ] 

Nanda kumar commented on HDDS-827:
--

Thanks [~elek] for taking care of this.

> TestStorageContainerManagerHttpServer should use dynamic port
> -
>
> Key: HDDS-827
> URL: https://issues.apache.org/jira/browse/HDDS-827
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Nanda kumar
>Assignee: Sandeep Nemuri
>Priority: Major
>  Labels: newbie
> Fix For: 0.4.0
>
> Attachments: HDDS-827.001.patch
>
>
> Most of the time {{TestStorageContainerManagerHttpServer}} is failing with 
> {code}
> java.net.BindException: Port in use: 0.0.0.0:9876
> ...
> Caused by: java.net.BindException: Address already in use
> {code}
> TestStorageContainerManagerHttpServer should use a port which is free 
> (dynamic), instead of trying to bind with default 9876.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-15 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: HDDS-801.003.patch

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch, HDDS-801.003.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-837) Persist originNodeId as part of .container file in datanode

2018-11-15 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688622#comment-16688622
 ] 

Nanda kumar commented on HDDS-837:
--

{{originPipelineId}} is good to have as part of the container info. I will work 
on adding originPipelineId.
The current wip patch only has originNodeId, will update the patch shortly to 
include originPipelineId as well.

> Persist originNodeId as part of .container file in datanode
> ---
>
> Key: HDDS-837
> URL: https://issues.apache.org/jira/browse/HDDS-837
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-837.wip.patch
>
>
> To differentiate the replica of QUASI_CLOSED containers we need 
> {{originNodeId}} field. With this field, we can uniquely identify a 
> QUASI_CLOSED container replica. This will be needed when we want to CLOSE a 
> QUASI_CLOSED container.
> This field will be set by the node where the container is created and stored 
> as part of {{.container}} file and will be sent as part of ContainerReport to 
> SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-837) Persist originNodeId as part of .container file in datanode

2018-11-15 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-837:
-
Attachment: HDDS-837.wip.patch

> Persist originNodeId as part of .container file in datanode
> ---
>
> Key: HDDS-837
> URL: https://issues.apache.org/jira/browse/HDDS-837
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-837.wip.patch
>
>
> To differentiate the replica of QUASI_CLOSED containers we need 
> {{originNodeId}} field. With this field, we can uniquely identify a 
> QUASI_CLOSED container replica. This will be needed when we want to CLOSE a 
> QUASI_CLOSED container.
> This field will be set by the node where the container is created and stored 
> as part of {{.container}} file and will be sent as part of ContainerReport to 
> SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-837) Persist originNodeId as part of .container file in datanode

2018-11-15 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-837 started by Nanda kumar.

> Persist originNodeId as part of .container file in datanode
> ---
>
> Key: HDDS-837
> URL: https://issues.apache.org/jira/browse/HDDS-837
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-837.wip.patch
>
>
> To differentiate the replica of QUASI_CLOSED containers we need 
> {{originNodeId}} field. With this field, we can uniquely identify a 
> QUASI_CLOSED container replica. This will be needed when we want to CLOSE a 
> QUASI_CLOSED container.
> This field will be set by the node where the container is created and stored 
> as part of {{.container}} file and will be sent as part of ContainerReport to 
> SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-15 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: (was: HDDS-801.003.patch)

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-15 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: HDDS-801.003.patch

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch, HDDS-801.003.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-15 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688608#comment-16688608
 ] 

Nanda kumar commented on HDDS-801:
--

Thanks [~msingh], [~shashikant] for the review.

bq. contaienrState to containerState
Fixed.
bq. updateContainerState should be changed to appropriate type like closing or 
stopContainer
Fixed.
bq. I feel this can be moved inside XceiverServerRatis
Done
bq. lets add a getter function
Added
bq. Should this be encapsulated in one function
Introduced {{closeInternal}} private method, both close and quasi close uses it 
now.
bq. When the transition from QUASI_CLOSED to CLOSED is allowed later, we should 
not compact the DB again.
It makes the code simple, it shouldn't be a problem even if we compact the db 
multiple times.
bq. the container should already be in CLOSING state, Lets add an precondition 
here that the container is already in closing state.
There are cases where {{handleCloseContainer}} is called even when the 
container is still in OPEN state (close container called via client API), some 
of which [~shashikant] has mentioned. So we move it to CLOSING state here if 
the container is not already in CLOSING state.
bq. lets change the assertion here to isQuasiClosed.
Fixed

bq. update the comment to be container getting "quasi closed" rather than 
getting closed.
Done
bq. closeContainer is exposed to clients in ContainerProtocolCalls.Java
To handle this case, if the container is in OPEN state we move it to CLOSING in 
{{KeyValueHandler#handleCloseContainer}}.
bq. Any state change in ContainerState should triggerICR
The ICR is triggered inside closeContainer/quasiCloseContainer call itself. No 
need to call updateContainerState internally.
bq. There can be a case where let's say the SCM gets network separated from a 
follower before sending...
This call will come through {{KeyValueHandler#handleCloseContainer}}, we will 
move the container to CLOSING state here if it's not there already.
bq. The comments look misleading here.
The TODO is for performance optimization which can be done later. The comment 
says that "Close container is not expected to be instantaneous" (current 
implementation). It looks fine to me.



> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch, HDDS-801.003.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-16 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: HDDS-801.004.patch

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch, HDDS-801.003.patch, HDDS-801.004.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-06 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677790#comment-16677790
 ] 

Nanda kumar commented on HDDS-737:
--

The shadedclient build is failing even without the patch in Jenkins run
https://builds.apache.org/job/PreCommit-HDDS-Build/1629/artifact/out/branch-shadedclient.txt

I thinks it affect the unit test run, whenever shadedclient build is failing 
none of the unit test are running and the build is failing with
{code}
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
{code}

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-737) Introduce Incremental Container Report

2018-11-06 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676432#comment-16676432
 ] 

Nanda kumar commented on HDDS-737:
--

[~jnp], thanks for the review.
{quote}In CloseContainerCommandHandler#handle the container state should be set 
to CLOSING before making a ratis call.
{quote}
This is done as part of HDDS-801.
{quote}pipelineManager is set in ContainerReportHandler but never used.
{quote}
Both in ContainerReportHandler and IncrementalContainerReportHandler, 
pipelineManager will be required when we handle state change. We need to remove 
the container from OPEN pipeline when the container is moved to CLOSED state. 
For now, added TODO in both the classes. When we handle state change, 
pipelineManager will be used.
{quote}Heartbeating thread can also receive interrupt when shutting down
{quote}
Good catch. Updated the comment.
{quote}NewNodeHandler does nothing. Shouldn't it send command for a container 
report?
{quote}
NewNode event is triggered by NodeManager, it has already made an entry for the 
registered node in NodeStateManager. We get container report as part of 
register call, and that container report will be processed by 
ContainerReportHandler to update the container replica state. We currently have 
nothing to do when we receive a new node event from NodeManager. NewNodeHandler 
is just a placeholder for now, in future, if required, we can use it.
{quote}Why is removeNode removed from NodeManager? It seems like the right 
place.
{quote}
We currently don't remove a node from NodeManager once it is registered. We can 
add removeNode logic when we implement decommissioning of a datanode. (existing 
removeNode logic was incomplete).

 

[~linyiqun]
{quote}I prefer to add additional try-catch for thread.sleep and get 
InterruptedException.
{quote}
Since we also have to handle {{InterruptedException}} when the shutdown is 
initiated, I feel it is better to have try-catch for the complete code inside 
the while loop.

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-797) If DN is started before SCM, it does not register

2018-11-06 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676484#comment-16676484
 ] 

Nanda kumar commented on HDDS-797:
--

[~hanishakoneru],
 It seems 
{{org.apache.hadoop.ozone.container.common.TestEndPoint#testCheckVersionResponse}}
 is failing after this commit.
{code:java}
[ERROR] 
testCheckVersionResponse(org.apache.hadoop.ozone.container.common.TestEndPoint) 
 Time elapsed: 0.142 s  <<< FAILURE!
java.lang.AssertionError: expected: but was:
{code}
Once we are in REGISTER state we don't allow {{getVersion}} call after this 
patch. This is causing the test case to fail.

Created HDDS-812.

> If DN is started before SCM, it does not register
> -
>
> Key: HDDS-797
> URL: https://issues.apache.org/jira/browse/HDDS-797
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Blocker
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-797.001.patch
>
>
> If a DN is started before SCM, it does not register with the SCM. DNs keep 
> trying to connect with the SCM and once SCM is up, the DN services are 
> shutdown instead of registering with SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-812) TestEndPoint#testCheckVersionResponse is failing

2018-11-06 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-812:


 Summary: TestEndPoint#testCheckVersionResponse is failing
 Key: HDDS-812
 URL: https://issues.apache.org/jira/browse/HDDS-812
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


 TestEndPoint#testCheckVersionResponse is failing with the below error
{code:java}
[ERROR] 
testCheckVersionResponse(org.apache.hadoop.ozone.container.common.TestEndPoint) 
 Time elapsed: 0.142 s  <<< FAILURE!
java.lang.AssertionError: expected: but was:
{code}

Once we are in REGISTER state we don't allow getVersion call anymore. This is 
causing the test case to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-737) Introduce Incremental Container Report

2018-11-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-737:
-
Attachment: HDDS-737.001.patch

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-737) Introduce Incremental Container Report

2018-11-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-737:
-
Attachment: (was: HDDS-737.001.patch)

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-737) Introduce Incremental Container Report

2018-11-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-737:
-
Attachment: HDDS-737.001.patch

> Introduce Incremental Container Report
> --
>
> Key: HDDS-737
> URL: https://issues.apache.org/jira/browse/HDDS-737
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-737.000.patch, HDDS-737.001.patch
>
>
> We will use Incremental Container Report (ICR) to immediately inform SCM when 
> there is some state change to the container in datanode. This will make sure 
> that SCM is updated as soon as the state of a container changes and doesn’t 
> have to wait for full container report.
> *When do we send ICR?*
> * When a container replica state changes from open/closing to closed
> * When a container replica state changes from open/closing to quasi closed
> * When a container replica state changes from quasi closed to closed
> * When a container replica is deleted in datanode
> * When a container replica is copied from another datanode
> * When a container replica is discovered to be corrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13348) Ozone: Update IP and hostname in Datanode from SCM's response to the register call

2018-11-06 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676773#comment-16676773
 ] 

Nanda kumar commented on HDFS-13348:


[~sunilg], hdds/ozone is not released as part of hadoop. Ozone follows a 
separate release cycle.
This jira was created before the creation of new hadoop sub project HDDS and 
was committed to HDFS-7240 branch. Even after merging HDFS-7240 to trunk, 
hadoop releases doesn't container hdds/ozone.

The correct fixed version for this jira should be 0.2.1 of Ozone. Since these 
jiras were created befor the sub project was created we don't have any correct 
fixed version for this in HDFS project.

> Ozone: Update IP and hostname in Datanode from SCM's response to the register 
> call
> --
>
> Key: HDFS-13348
> URL: https://issues.apache.org/jira/browse/HDFS-13348
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13348-HDFS-7240.000.patch, 
> HDFS-13348-HDFS-7240.001.patch, HDFS-13348-HDFS-7240.002.patch
>
>
> Whenever a Datanode registers with SCM, the SCM resolves the IP address and 
> hostname of the Datanode form the RPC call. This IP address and hostname 
> should be sent back to Datanode in the response to register call and the 
> Datanode has to update the values from the response to its 
> {{DatanodeDetails}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13348) Ozone: Update IP and hostname in Datanode from SCM's response to the register call

2018-11-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDFS-13348:
---
Fix Version/s: HDFS-7240

> Ozone: Update IP and hostname in Datanode from SCM's response to the register 
> call
> --
>
> Key: HDFS-13348
> URL: https://issues.apache.org/jira/browse/HDFS-13348
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13348-HDFS-7240.000.patch, 
> HDFS-13348-HDFS-7240.001.patch, HDFS-13348-HDFS-7240.002.patch
>
>
> Whenever a Datanode registers with SCM, the SCM resolves the IP address and 
> hostname of the Datanode form the RPC call. This IP address and hostname 
> should be sent back to Datanode in the response to register call and the 
> Datanode has to update the values from the response to its 
> {{DatanodeDetails}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-13 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Status: Patch Available  (was: In Progress)

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-13 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: HDDS-801.001.patch

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-837) Persist originNodeId as part of .container file in datanode

2018-11-14 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-837:


 Summary: Persist originNodeId as part of .container file in 
datanode
 Key: HDDS-837
 URL: https://issues.apache.org/jira/browse/HDDS-837
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Nanda kumar
Assignee: Nanda kumar


To differentiate the replica of QUASI_CLOSED containers we need 
{{originNodeId}} field. With this field, we can uniquely identify a 
QUASI_CLOSED container replica. This will be needed when we want to CLOSE a 
QUASI_CLOSED container.

This field will be set by the node where the container is created and stored as 
part of {{.container}} file and will be sent as part of ContainerReport to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-801:
-
Attachment: HDDS-801.002.patch

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686242#comment-16686242
 ] 

Nanda kumar commented on HDDS-801:
--

/cc [~jnp] [~arpitagarwal] [~msingh] [~hanishakoneru]

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-798) Storage-class is showing incorrectly

2018-11-08 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679672#comment-16679672
 ] 

Nanda kumar commented on HDDS-798:
--

Created HDDS-823 for {{RestClient#getKeyDetails}} failure.

> Storage-class is showing incorrectly
> 
>
> Key: HDDS-798
> URL: https://issues.apache.org/jira/browse/HDDS-798
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-798.00.patch
>
>
> After HDDS-712, we support storage-class.
> For list-objects, even if key has set storage-class to REDUCED_REDUNDANCY, 
> still it shows STANDARD.
> As in code in list object response, we have hardcoded it as below.
> keyMetadata.setStorageClass("STANDARD");



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-733) Create container if not exist, as part of chunk write

2018-11-09 Thread Nanda kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681705#comment-16681705
 ] 

Nanda kumar commented on HDDS-733:
--

+1, LGTM. Will fix the checkstyle issues while committing.

> Create container if not exist, as part of chunk write
> -
>
> Key: HDDS-733
> URL: https://issues.apache.org/jira/browse/HDDS-733
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-733.001.patch, HDDS-733.002.patch, 
> HDDS-733.003.patch, HDDS-733.004.patch
>
>
> The current implementation requires a container to be created in datanode 
> before starting the chunk write. This can be optimized by creating the 
> container on the first chunk write.
>  During chunk write, if the container is missing, we can go ahead and create 
> the container.
> Along with this change ALLOCATED and CREATING container states can be removed 
> as they were used to track which containers have been successfully created. 
> Also there is a shouldCreateContainer flag which is used by client to know if 
> it needs to create container. This flag can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    5   6   7   8   9   10   11   12   13   14   >