[jira] [Updated] (HDFS-12129) Ozone: SCM http server is not stopped with SCM#stop()

2017-07-12 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12129:
---
Affects Version/s: HDFS-7240

> Ozone: SCM http server is not stopped with SCM#stop()
> -
>
> Key: HDFS-12129
> URL: https://issues.apache.org/jira/browse/HDFS-12129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, scm
>Affects Versions: HDFS-7240
>Reporter: Weiwei Yang
>
> Found this issue while trying to restarting scm, it failed on address already 
> in use error. This is because the http server is not stopped in stop() method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12129) Ozone: SCM http server is not stopped with SCM#stop()

2017-07-12 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12129:
---
Summary: Ozone: SCM http server is not stopped with SCM#stop()  (was: Ozone)

> Ozone: SCM http server is not stopped with SCM#stop()
> -
>
> Key: HDFS-12129
> URL: https://issues.apache.org/jira/browse/HDFS-12129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12129) Ozone

2017-07-12 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12129:
--

 Summary: Ozone
 Key: HDFS-12129
 URL: https://issues.apache.org/jira/browse/HDFS-12129
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Weiwei Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-13 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085463#comment-16085463
 ] 

Weiwei Yang commented on HDFS-12098:


Ah found the difference after hours of debugging ... it's not that easy to get 
this reproduced from mini cluster, let me explain, the behavior is different 
from mini cluster and a real cluster setup,

*Mini Cluster*
In class {{MiniOzoneCluster}}, we are initiating SCM like

{code}
StorageContainerManager scm = new StorageContainerManager(conf);
f(!disableSCM) {
  // start SCM if it is not disabled.
  scm.start();
}
{code}

the constructor of scm will init scm datanode, client RPC servers.  During the 
initiation, {{RPC.Builder(conf)...build()}} will bind the RPC server to the 
specific port, once the port is bound, subsequent client RPC calls e.g

{code}
 SCMVersionResponseProto versionResponse =
  rpcEndPoint.getEndPoint().getVersion(null);
{code}

will try to connect that port and read data, however the service is not 
responding, thus it gets a {{SocketTimeout}}.

*Real Cluster*

However, in a real cluster environment. Scm constructor will not be called, so 
the port will not be bound. When the RPC client tries to connect to that port, 
it gets a {{connection refused error}}. This error is caught and triggered the 
RetryPolicy, that's where I saw 10 times of retry which causes this problem 
(thread leak).

I am not sure if it is worth to fix this problem in mini cluster, that probably 
needs to refactor the SCM constructor to move RPC init code out. Since this 
issue can be simply reproduced in a cluster setup following the steps in the 
description.

Please kindly advise. Thanks.

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png, 
> thread_dump.log
>
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state 
> machine could transit to RUNNING. However in actual, its state transits to 
> SHUTDOWN, datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088382#comment-16088382
 ] 

Weiwei Yang commented on HDFS-12098:


Hi [~anu]

I just uploaded a test case patch to reproduce this problem from UT. I revised 
some code about how scm was started in MiniOzoneCluster, ensures that scm 
constructor is only called when scm is started. In this case, I could reproduce 
the same issue as I was seeing from a real setup. Please take a look and if you 
are agree with the problem I described, we then can look at the fix.

Thank you. 

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088411#comment-16088411
 ] 

Weiwei Yang commented on HDFS-12098:


Please hold on looking at the test patch, it still has some problems.. working 
on a new one :P

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12098:
---
Status: In Progress  (was: Patch Available)

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12098:
---
Attachment: HDFS-12098-HDFS-7240.testcase.patch

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-07-15 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088615#comment-16088615
 ] 

Weiwei Yang commented on HDFS-12069:


Hello [~anu]

Thanks for your +1 :). Since it's being a while when v11 patch was uploaded, I 
just rebased to latest and want to make sure Jenkins is still happy. Will 
commit if everything is fine. Thanks [~anu] to help to review this. Appreciate!

> Ozone: Create a general abstraction for metadata store
> --
>
> Key: HDFS-12069
> URL: https://issues.apache.org/jira/browse/HDFS-12069
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12069-HDFS-7240.001.patch, 
> HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, 
> HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, 
> HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, 
> HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, 
> HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, 
> HDFS-12069-HDFS-7240.012.patch
>
>
> Create a general abstraction for metadata store so that we can plug other key 
> value store to host ozone metadata. Currently only levelDB is implemented, we 
> want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-07-15 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12069:
---
Attachment: HDFS-12069-HDFS-7240.012.patch

> Ozone: Create a general abstraction for metadata store
> --
>
> Key: HDFS-12069
> URL: https://issues.apache.org/jira/browse/HDFS-12069
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12069-HDFS-7240.001.patch, 
> HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, 
> HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, 
> HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, 
> HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, 
> HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, 
> HDFS-12069-HDFS-7240.012.patch
>
>
> Create a general abstraction for metadata store so that we can plug other key 
> value store to host ozone metadata. Currently only levelDB is implemented, we 
> want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12147) Ozone: KSM: Add checkBucketAccess

2017-07-18 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091335#comment-16091335
 ] 

Weiwei Yang commented on HDFS-12147:


Hi [~nandakumar131]

Thank you. But even we want to expose them to clients, the API arguments still 
look odd to me. How would a client to compose an OzoneAcl in the request when 
it wants to check a certain access? Semantically we often check against an 
{{User Identity}} and an {{operation}} (e.g read/write/delete). Use this patch, 
does it work like following?

Suppose a bucket has following ACL

{noformat}
user:bilbo:rw
user:john:r
user:mike:w
{noformat}

and a client pass an OzoneAcl like following

{{user:mike:w}}

this means I want to check if user mike has the write permission to the bucket? 
And this case it has the access.

What if the bucket ACL is like following

{noformat}
user:bilbo:rw
user:john:r
group:hadoop:w
{noformat}

and mike belongs to hadoop group, when I verify {{user:mike:w}}, will it give 
me an access control exception?

> Ozone: KSM: Add checkBucketAccess
> -
>
> Key: HDFS-12147
> URL: https://issues.apache.org/jira/browse/HDFS-12147
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-12147-HDFS-7240.000.patch, 
> HDFS-12147-HDFS-7240.001.patch
>
>
> Checks if the caller has access to a given bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12154) Incorrect javadoc description in StorageLocationChecker#check

2017-07-18 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091336#comment-16091336
 ] 

Weiwei Yang commented on HDFS-12154:


Looks good to me, +1, committing now.

> Incorrect javadoc description in StorageLocationChecker#check
> -
>
> Key: HDFS-12154
> URL: https://issues.apache.org/jira/browse/HDFS-12154
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Nandakumar
>Assignee: Nandakumar
>Priority: Trivial
> Attachments: HDFS-12154.000.patch
>
>
> {{StorageLocationChecker#check}} returns list of healthy volumes, but javadoc 
> states that it returns failed volumes.
> {code}
> /**
>* Initiate a check of the supplied storage volumes and return
>* a list of failed volumes.
>*
>* StorageLocations are returned in the same order as the input
>* for compatibility with existing unit tests.
>*
>* @param conf HDFS configuration.
>* @param dataDirs list of volumes to check.
>* @return returns a list of failed volumes. Returns the empty list if
>* there are no failed volumes.
>*
>* @throws InterruptedException if the check was interrupted.
>* @throws IOException if the number of failed volumes exceeds the
>* maximum allowed or if there are no good
>* volumes.
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090940#comment-16090940
 ] 

Weiwei Yang commented on HDFS-12149:


Ah posted last comment too quick until I see [~aw]'s comment. 

bq. basically saying dont commit new code until we have something we can use.

does it mean once there is a new rocksdb released which has included the 
license update, then we can commit this?

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12147) Ozone: KSM: Add checkBucketAccess

2017-07-18 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091335#comment-16091335
 ] 

Weiwei Yang edited comment on HDFS-12147 at 7/18/17 9:26 AM:
-

Hi [~nandakumar131]

Thank you. But even we want to expose them to clients, the API arguments still 
look odd to me. How would a client to compose an OzoneAcl in the request when 
it wants to check a certain access? Semantically we often check against an 
{{User Identity}} and an {{operation}} (e.g read/write/delete). Use this patch, 
does it work like following?

Suppose a bucket has following ACL

{noformat}
user:bilbo:rw
user:john:r
user:mike:w
{noformat}

and a client pass an OzoneAcl like following

{{user:mike:w}}

this means I want to check if user mike has the write permission to the bucket? 
And this case it has the access.

What if the bucket ACL is like following

{noformat}
user:bilbo:rw
user:john:r
group:hadoop:w
{noformat}

and mike belongs to hadoop group, when I verify {{user:mike:w}}, will it give 
me an access control exception?

Forgive me I just want to understand how this works.

Thanks a lot.


was (Author: cheersyang):
Hi [~nandakumar131]

Thank you. But even we want to expose them to clients, the API arguments still 
look odd to me. How would a client to compose an OzoneAcl in the request when 
it wants to check a certain access? Semantically we often check against an 
{{User Identity}} and an {{operation}} (e.g read/write/delete). Use this patch, 
does it work like following?

Suppose a bucket has following ACL

{noformat}
user:bilbo:rw
user:john:r
user:mike:w
{noformat}

and a client pass an OzoneAcl like following

{{user:mike:w}}

this means I want to check if user mike has the write permission to the bucket? 
And this case it has the access.

What if the bucket ACL is like following

{noformat}
user:bilbo:rw
user:john:r
group:hadoop:w
{noformat}

and mike belongs to hadoop group, when I verify {{user:mike:w}}, will it give 
me an access control exception?

> Ozone: KSM: Add checkBucketAccess
> -
>
> Key: HDFS-12147
> URL: https://issues.apache.org/jira/browse/HDFS-12147
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-12147-HDFS-7240.000.patch, 
> HDFS-12147-HDFS-7240.001.patch
>
>
> Checks if the caller has access to a given bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12154) Incorrect javadoc description in StorageLocationChecker#check

2017-07-18 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12154:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

> Incorrect javadoc description in StorageLocationChecker#check
> -
>
> Key: HDFS-12154
> URL: https://issues.apache.org/jira/browse/HDFS-12154
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Nandakumar
>Assignee: Nandakumar
>Priority: Trivial
> Fix For: HDFS-7240
>
> Attachments: HDFS-12154.000.patch
>
>
> {{StorageLocationChecker#check}} returns list of healthy volumes, but javadoc 
> states that it returns failed volumes.
> {code}
> /**
>* Initiate a check of the supplied storage volumes and return
>* a list of failed volumes.
>*
>* StorageLocations are returned in the same order as the input
>* for compatibility with existing unit tests.
>*
>* @param conf HDFS configuration.
>* @param dataDirs list of volumes to check.
>* @return returns a list of failed volumes. Returns the empty list if
>* there are no failed volumes.
>*
>* @throws InterruptedException if the check was interrupted.
>* @throws IOException if the number of failed volumes exceeds the
>* maximum allowed or if there are no good
>* volumes.
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12147) Ozone: KSM: Add checkBucketAccess

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091091#comment-16091091
 ] 

Weiwei Yang edited comment on HDFS-12147 at 7/18/17 4:34 AM:
-

Hi [~nandakumar131], [~vagarychen]

I am a bit confused with this patch.

1. Why the checkBucketAccess is exposed as a RPC call in KSM? Is it something 
that should be done internally in KSM while read/write/delete keys in a bucket? 
I am not sure why this is necessary to be exposed via 
{{KeySpaceManagerProtocol}}.

2. {{OzoneMetadataManager#checkBucketAccess}} loads the acls of a bucket from 
KSM db and compare that to the value passing by argument {{OzoneAcl}}, why we 
are comparing OzoneAcl ? I thought OzoneAcl was used to verify if a given 
user/group have a particular permission, e.g we could have OzoneAcl like 
following

{{user:bilbo:rw}}

which means user {{bilbo}} has read as well as write permission to the bucket. 
So it's pretty nature to check against user and group name. I don't understand 
the check in line 843 - 853, can you elaborate please ?

Thank you.


was (Author: cheersyang):
Hi [~nandakumar131], [~vagarychen]

I am a bit confused with this patch.

1. Why the checkBucketAccess is exposed as a RPC call in KSM? Is it something 
that should be done internally in KSM while read/write/delete keys in a bucket? 
I am not sure why this is necessary to be exposed via 
{{KeySpaceManagerProtocol}}.

2. {{OzoneMetadataManager#checkBucketAccess}} loads the acls of a bucket from 
KSM db and compare that to the value passing by argument {{OzoneAcl}}, why we 
are comparing OzoneAcl ? I thought OzoneAcl was used to verify if a given 
user/group have a particular permission, e.g we could have OzoneAcl like 
following

  user:bilbo:rw

which means user {{bilbo}} has read as well as write permission to the bucket. 
So it's pretty nature to check against user and group name. I don't understand 
the check in line 843 - 853, can you elaborate please ?

Thank you.

> Ozone: KSM: Add checkBucketAccess
> -
>
> Key: HDFS-12147
> URL: https://issues.apache.org/jira/browse/HDFS-12147
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-12147-HDFS-7240.000.patch, 
> HDFS-12147-HDFS-7240.001.patch
>
>
> Checks if the caller has access to a given bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12147) Ozone: KSM: Add checkBucketAccess

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091091#comment-16091091
 ] 

Weiwei Yang commented on HDFS-12147:


Hi [~nandakumar131], [~vagarychen]

I am a bit confused with this patch.

1. Why the checkBucketAccess is exposed as a RPC call in KSM? Is it something 
that should be done internally in KSM while read/write/delete keys in a bucket? 
I am not sure why this is necessary to be exposed via 
{{KeySpaceManagerProtocol}}.

2. {{OzoneMetadataManager#checkBucketAccess}} loads the acls of a bucket from 
KSM db and compare that to the value passing by argument {{OzoneAcl}}, why we 
are comparing OzoneAcl ? I thought OzoneAcl was used to verify if a given 
user/group have a particular permission, e.g we could have OzoneAcl like 
following

  user:bilbo:rw

which means user {{bilbo}} has read as well as write permission to the bucket. 
So it's pretty nature to check against user and group name. I don't understand 
the check in line 843 - 853, can you elaborate please ?

Thank you.

> Ozone: KSM: Add checkBucketAccess
> -
>
> Key: HDFS-12147
> URL: https://issues.apache.org/jira/browse/HDFS-12147
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-12147-HDFS-7240.000.patch, 
> HDFS-12147-HDFS-7240.001.patch
>
>
> Checks if the caller has access to a given bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12115) Ozone: SCM: Add queryNode RPC Call

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091068#comment-16091068
 ] 

Weiwei Yang commented on HDFS-12115:


Hi [~anu]

Thanks for the updates, I have comments to the v5 patch,

*InProgressPool.java*

NIT, line 203, extra space between NodeState and getNodeState

*MockNodeManager.java*

line 161-173, it seems this can be replaced by {{getNodes(nodestate).size()}}, 
but we need to make sure getNodes won't return us a null, maybe a empty list?

*Ozone.proto*

Add a placeholder for {{DECOMMISSIONING}} state?

*SCMNodeManager.java*

line 413-435, like you mentioned earlier, a node may have more than 1 state, 
e.g both HEALTHY and RAFT_MEMBER. But here getNodeState will only return single 
state, should this return an array of NodeState?

line 491: instead of creating a new list, this can be done in java8 style
{{return currentSet.stream().collect(Collectors.toList());}}

Hope this helps, thanks

> Ozone: SCM: Add queryNode RPC Call
> --
>
> Key: HDFS-12115
> URL: https://issues.apache.org/jira/browse/HDFS-12115
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-12115-HDFS-7240.001.patch, 
> HDFS-12115-HDFS-7240.002.patch, HDFS-12115-HDFS-7240.003.patch, 
> HDFS-12115-HDFS-7240.004.patch, HDFS-12115-HDFS-7240.005.patch
>
>
> Add queryNode RPC to Storage container location protocol. This allows 
> applications like SCM CLI to get the list of nodes in various states, like 
> Healthy, live or Dead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11996) Ozone : add an UT to test partial read of chunks

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091079#comment-16091079
 ] 

Weiwei Yang commented on HDFS-11996:


Just committed to the feature branch, thanks [~vagarychen] for the 
contribution. Thanks for [~anu] for the review.

> Ozone : add an UT to test partial read of chunks
> 
>
> Key: HDFS-11996
> URL: https://issues.apache.org/jira/browse/HDFS-11996
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
> Environment: Currently when reading a chunk, it is always the whole 
> chunk that gets returned. However it is possible the reader may only need to 
> read a subset of the chunk. This JIRA adds the partial read of chunks.
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Fix For: HDFS-7240
>
> Attachments: HDFS-11996-HDFS-7240.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11996) Ozone : add an UT to test partial read of chunks

2017-07-17 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-11996:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

> Ozone : add an UT to test partial read of chunks
> 
>
> Key: HDFS-11996
> URL: https://issues.apache.org/jira/browse/HDFS-11996
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
> Environment: Currently when reading a chunk, it is always the whole 
> chunk that gets returned. However it is possible the reader may only need to 
> read a subset of the chunk. This JIRA adds the partial read of chunks.
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Fix For: HDFS-7240
>
> Attachments: HDFS-11996-HDFS-7240.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11996) Ozone : add partial read of chunks

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091073#comment-16091073
 ] 

Weiwei Yang commented on HDFS-11996:


Non of the UT failures were related to this patch, I am going to test this 
patch again with latest code base, if everything goes fine, I will commit this 
shortly. Thanks for [~vagarychen] for adding this test and [~anu] for the 
review.

> Ozone : add partial read of chunks
> --
>
> Key: HDFS-11996
> URL: https://issues.apache.org/jira/browse/HDFS-11996
> Project: Hadoop HDFS
>  Issue Type: Sub-task
> Environment: Currently when reading a chunk, it is always the whole 
> chunk that gets returned. However it is possible the reader may only need to 
> read a subset of the chunk. This JIRA adds the partial read of chunks.
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11996-HDFS-7240.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091052#comment-16091052
 ] 

Weiwei Yang commented on HDFS-12098:


Oh [~anu], no problem at all. Thanks for your quick reply.


> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase-1.patch, 
> HDFS-12098-HDFS-7240.testcase.patch, Screen Shot 2017-07-11 at 4.58.08 
> PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11996) Ozone : add an UT to test partial read of chunks

2017-07-17 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-11996:
---
Target Version/s: HDFS-7240
Priority: Minor  (was: Major)
 Component/s: test
  ozone

> Ozone : add an UT to test partial read of chunks
> 
>
> Key: HDFS-11996
> URL: https://issues.apache.org/jira/browse/HDFS-11996
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
> Environment: Currently when reading a chunk, it is always the whole 
> chunk that gets returned. However it is possible the reader may only need to 
> read a subset of the chunk. This JIRA adds the partial read of chunks.
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-11996-HDFS-7240.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11996) Ozone : add an UT to test partial read of chunks

2017-07-17 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-11996:
---
Summary: Ozone : add an UT to test partial read of chunks  (was: Ozone : 
add partial read of chunks)

> Ozone : add an UT to test partial read of chunks
> 
>
> Key: HDFS-11996
> URL: https://issues.apache.org/jira/browse/HDFS-11996
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
> Environment: Currently when reading a chunk, it is always the whole 
> chunk that gets returned. However it is possible the reader may only need to 
> read a subset of the chunk. This JIRA adds the partial read of chunks.
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11996-HDFS-7240.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090936#comment-16090936
 ] 

Weiwei Yang commented on HDFS-12149:


Thanks [~anu], sounds good to me. Since it seems no regression is introduced 
with this patch from the UT result, I am going to fix the checksytle issues and 
commit this today. After then, we can do more tests with rocks db. Thanks a lot 
for your quick response.

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12154) Incorrect javadoc description in StorageLocationChecker#check

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090945#comment-16090945
 ] 

Weiwei Yang commented on HDFS-12154:


+1, pending on jenkins. Thanks [~nandakumar131] to fix this.

> Incorrect javadoc description in StorageLocationChecker#check
> -
>
> Key: HDFS-12154
> URL: https://issues.apache.org/jira/browse/HDFS-12154
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Nandakumar
>Assignee: Nandakumar
>Priority: Trivial
> Attachments: HDFS-12154.000.patch
>
>
> {{StorageLocationChecker#check}} returns list of healthy volumes, but javadoc 
> states that it returns failed volumes.
> {code}
> /**
>* Initiate a check of the supplied storage volumes and return
>* a list of failed volumes.
>*
>* StorageLocations are returned in the same order as the input
>* for compatibility with existing unit tests.
>*
>* @param conf HDFS configuration.
>* @param dataDirs list of volumes to check.
>* @return returns a list of failed volumes. Returns the empty list if
>* there are no failed volumes.
>*
>* @throws InterruptedException if the check was interrupted.
>* @throws IOException if the number of failed volumes exceeds the
>* maximum allowed or if there are no good
>* volumes.
>*/
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090926#comment-16090926
 ] 

Weiwei Yang commented on HDFS-12149:


Hi [~anu]

There was a rocksdb build few hours ago 5.5.3 but it seems still uses old 
license, we will need to wait a few more days until the license updates. Do you 
want me to commit this first and open another JIRA to track the version update 
(so we can start to play with rocksdb), or you want me to hold off this patch 
until the new version comes out?

Thank you.

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-17 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091004#comment-16091004
 ] 

Weiwei Yang commented on HDFS-12098:


Hi [~anu]

Have you tried to reproduce this issue or apply the test case patch I uploaded 
to take a look at the issue ? Please let me know, thanks.

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase-1.patch, 
> HDFS-12098-HDFS-7240.testcase.patch, Screen Shot 2017-07-11 at 4.58.08 
> PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands

2017-07-18 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091383#comment-16091383
 ] 

Weiwei Yang edited comment on HDFS-12126 at 7/18/17 10:25 AM:
--

Thanks [~linyiqun] for working on this, this is very much needed.
I just read your v2 patch, I think overall it looks good, some comments

*OzoneUtils*

line 149: please add IllegalArgumentException in the method signature.

*TestOzoneShell*

# testCreateBucket: Can we add a test to create a bucket in a non-exist volume?
# line 357: seems bucketInfo can be safely removed
# line 359: this could mis-behave if vol.getBucket doesn't throw any exception, 
it will not reach the assert statement in the catch clause
# line 566: it might be over concerned, but there might be slight chance two 
calls returns same volume name, can we use UUID or add a prefix as argument for 
{{creatVolume}} to completely avoid that?

Thanks


was (Author: cheersyang):
Thanks [~linyiqun] for working on this, this is very much needed.
I just read your v2 patch, I think overall it looks good, some comments for 
TestOzoneShell

# testCreateBucket: Can we add a test to create a bucket in a non-exist volume?
# line 357: seems bucketInfo can be safely removed
# line 359: this could mis-behave if vol.getBucket doesn't throw any exception, 
it will not reach the assert statement in the catch clause
# line 566: it might be over concerned, but there might be slight chance two 
calls returns same volume name, can we use UUID or add a prefix as argument for 
{{creatVolume}} to completely avoid that?

Thanks

> Ozone: Ozone shell: Add more testing for bucket shell commands
> --
>
> Key: HDFS-12126
> URL: https://issues.apache.org/jira/browse/HDFS-12126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12126-HDFS-7240.001.patch, 
> HDFS-12126-HDFS-7240.002.patch
>
>
> Adding more unit tests for ozone bucket commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12147) Ozone: KSM: Add checkBucketAccess

2017-07-18 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091358#comment-16091358
 ] 

Weiwei Yang commented on HDFS-12147:


Hi [~nandakumar131]

Please hold on submitting a new patch, lets route this discussion to [~anu] as 
he reviewed HDFS-11771 for checkVolumeAccess. Can we revisit this 2 APIs and 
get them consisted? Ping [~anu], please take a look and let us know your 
thought, thanks.

My thought if we are going to support ACLs, then we need to have an overall 
picture what places will need these checks and make sure they are all 
addressed. Otherwise it will be like some place working, some place not.

Thank you.

> Ozone: KSM: Add checkBucketAccess
> -
>
> Key: HDFS-12147
> URL: https://issues.apache.org/jira/browse/HDFS-12147
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-12147-HDFS-7240.000.patch, 
> HDFS-12147-HDFS-7240.001.patch
>
>
> Checks if the caller has access to a given bucket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands

2017-07-18 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091383#comment-16091383
 ] 

Weiwei Yang commented on HDFS-12126:


Thanks [~linyiqun] for working on this, this is very much needed.
I just read your v2 patch, I think overall it looks good, some comments for 
TestOzoneShell

# testCreateBucket: Can we add a test to create a bucket in a non-exist volume?
# line 357: seems bucketInfo can be safely removed
# line 359: this could mis-behave if vol.getBucket doesn't throw any exception, 
it will not reach the assert statement in the catch clause
# line 566: it might be over concerned, but there might be slight chance two 
calls returns same volume name, can we use UUID or add a prefix as argument for 
{{creatVolume}} to completely avoid that?

Thanks

> Ozone: Ozone shell: Add more testing for bucket shell commands
> --
>
> Key: HDFS-12126
> URL: https://issues.apache.org/jira/browse/HDFS-12126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12126-HDFS-7240.001.patch, 
> HDFS-12126-HDFS-7240.002.patch
>
>
> Adding more unit tests for ozone bucket commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12098:
--

 Summary: Ozone: Datanode is unable to register with scm if scm 
starts later
 Key: HDFS-12098
 URL: https://issues.apache.org/jira/browse/HDFS-12098
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ozone, scm
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Critical


Reproducing steps
# Start datanode
# Wait and see datanode state, it has connection issues, this is expected
# Start SCM, expecting datanode could connect to the scm and the state machine 
could transit to RUNNING. However in actual, its state transits to SHUTDOWN, 
datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078213#comment-16078213
 ] 

Weiwei Yang commented on HDFS-12098:


This is because datanode state machine leaks {{VersionEndpointTask}} thread. In 
the case scm is not yet started,
 more and more {{VersionEndpointTask}} threads keep retrying connection with 
scm,

{noformat}
INIT - RUNNING 
 \
GETVERSION
   executor.execute(new VersionEndpointTask()) - retry on 
getVersion ...
   ... (HB interval)
   executor.execute(new VersionEndpointTask()) - retry on 
getVersion ...
   ... (HB interval)
   executor.execute(new VersionEndpointTask()) - retry on 
getVersion ...
   ...
{noformat}

the version endpoint tasks are launched in HB interval (5s on my env), so every 
5s there is a new task submitted; the retry policy for each getVersion call is 
10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be 
ONE thread leak.

When scm is up, all pending tasks will be able to connect to scm and getVersion 
call returns, so each of them will count the state to next, since the state is 
shared in {{EndpointStateMachine}}, it increments more than 1 so when I review 
the state changes, it looks like below

{noformat}
REGISTER
HEARTBEAT
SHUTDOWN
SHUTDOWN
SHUTDOWN
... 
{noformat}

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state 
> machine could transit to RUNNING. However in actual, its state transits to 
> SHUTDOWN, datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12098:
---
Attachment: thread_dump.log

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: thread_dump.log
>
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state 
> machine could transit to RUNNING. However in actual, its state transits to 
> SHUTDOWN, datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078213#comment-16078213
 ] 

Weiwei Yang edited comment on HDFS-12098 at 7/7/17 3:11 PM:


This is because datanode state machine leaks {{VersionEndpointTask}} thread. In 
the case scm is not yet started,
 more and more {{VersionEndpointTask}} threads keep retrying connection with 
scm,

{noformat}
INIT - RUNNING 
 \
GETVERSION
 new VersionEndpointTask submitted - retrying ...
   ... (HB interval)
 new VersionEndpointTask submitted - retrying ...
   ... (HB interval)
 new VersionEndpointTask submitted - retrying ...
   ...
{noformat}

the version endpoint tasks are launched in HB interval (5s on my env), so every 
5s there is a new task submitted; the retry policy for each getVersion call is 
10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be 
ONE thread leak.

When scm is up, all pending tasks will be able to connect to scm and getVersion 
call returns, so each of them will count the state to next, since the state is 
shared in {{EndpointStateMachine}}, it increments more than 1 so when I review 
the state changes, it looks like below

{noformat}
REGISTER
HEARTBEAT
SHUTDOWN
SHUTDOWN
SHUTDOWN
... 
{noformat}


was (Author: cheersyang):
This is because datanode state machine leaks {{VersionEndpointTask}} thread. In 
the case scm is not yet started,
 more and more {{VersionEndpointTask}} threads keep retrying connection with 
scm,

{noformat}
INIT - RUNNING 
 \
GETVERSION
   executor.execute(new VersionEndpointTask()) - retry on 
getVersion ...
   ... (HB interval)
   executor.execute(new VersionEndpointTask()) - retry on 
getVersion ...
   ... (HB interval)
   executor.execute(new VersionEndpointTask()) - retry on 
getVersion ...
   ...
{noformat}

the version endpoint tasks are launched in HB interval (5s on my env), so every 
5s there is a new task submitted; the retry policy for each getVersion call is 
10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be 
ONE thread leak.

When scm is up, all pending tasks will be able to connect to scm and getVersion 
call returns, so each of them will count the state to next, since the state is 
shared in {{EndpointStateMachine}}, it increments more than 1 so when I review 
the state changes, it looks like below

{noformat}
REGISTER
HEARTBEAT
SHUTDOWN
SHUTDOWN
SHUTDOWN
... 
{noformat}

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: thread_dump.log
>
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state 
> machine could transit to RUNNING. However in actual, its state transits to 
> SHUTDOWN, datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-07 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078213#comment-16078213
 ] 

Weiwei Yang edited comment on HDFS-12098 at 7/7/17 3:15 PM:


This is because datanode state machine leaks {{VersionEndpointTask}} thread. In 
the case scm is not yet started,
 more and more {{VersionEndpointTask}} threads keep retrying connection with 
scm,

{noformat}
INIT - RUNNING 
 \
GETVERSION
 new VersionEndpointTask submitted - retrying ...
   ... (HB interval)
 new VersionEndpointTask submitted - retrying ...
   ... (HB interval)
 new VersionEndpointTask submitted - retrying ...
   ...
{noformat}

the version endpoint tasks are launched in HB interval (5s on my env), so every 
5s there is a new task submitted; the retry policy for each getVersion call is 
10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be 
ONE thread leak.

Please see [^thread_dump.log], there are 20 VersionEndpointTask threads in 
WAITING state. And this number keeps increasing.

When scm is up, all pending tasks will be able to connect to scm and getVersion 
call returns, so each of them will count the state to next, since the state is 
shared in {{EndpointStateMachine}}, it increments more than 1 so when I review 
the state changes, it looks like below

{noformat}
REGISTER
HEARTBEAT
SHUTDOWN
SHUTDOWN
SHUTDOWN
... 
{noformat}

To fix this, instead of using a central ExecutorService carried in 
{{DatanodeStateMachine}}, instead we could init a fixed size of thread pool to 
execute end point tasks, and make sure the thread pool gets shutdown before 
entering next state (at end of await).


was (Author: cheersyang):
This is because datanode state machine leaks {{VersionEndpointTask}} thread. In 
the case scm is not yet started,
 more and more {{VersionEndpointTask}} threads keep retrying connection with 
scm,

{noformat}
INIT - RUNNING 
 \
GETVERSION
 new VersionEndpointTask submitted - retrying ...
   ... (HB interval)
 new VersionEndpointTask submitted - retrying ...
   ... (HB interval)
 new VersionEndpointTask submitted - retrying ...
   ...
{noformat}

the version endpoint tasks are launched in HB interval (5s on my env), so every 
5s there is a new task submitted; the retry policy for each getVersion call is 
10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be 
ONE thread leak.

When scm is up, all pending tasks will be able to connect to scm and getVersion 
call returns, so each of them will count the state to next, since the state is 
shared in {{EndpointStateMachine}}, it increments more than 1 so when I review 
the state changes, it looks like below

{noformat}
REGISTER
HEARTBEAT
SHUTDOWN
SHUTDOWN
SHUTDOWN
... 
{noformat}

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: thread_dump.log
>
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state 
> machine could transit to RUNNING. However in actual, its state transits to 
> SHUTDOWN, datanode enters chill mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-16 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089218#comment-16089218
 ] 

Weiwei Yang commented on HDFS-12149:


Thanks [~anu] for the message, I will work on this.

bq. I am aware that what we have is a generic plugin layer which can use most 
key value stores, and RocksDB is just a specific instance of it and it is 
trivial for us to revert it, even if it is committed.

That's correct. We will follow Legal team's decision like you mentioned. It is 
trivial to revert this with a simple switch. Thank you.

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12098:
---
Attachment: HDFS-12098-HDFS-7240.testcase.patch

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283
 ] 

Weiwei Yang commented on HDFS-12098:


Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

if you apply this patch, it's gonna fail. You might have noticed the patch 
changes some more code than just adding a test, that is because the reason I 
mentioned earlier. I also have added a method to check if a datanode is 
registered to scm so that we can check datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283
 ] 

Weiwei Yang edited comment on HDFS-12098 at 7/17/17 4:01 AM:
-

Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

if you apply this patch, it's gonna to fail. Some log from step 4 is 
interesting,

{noformat}
2017-07-17 11:46:02,451 [Datanode State Machine Thread - 0] INFO  ipc.Client 
(Client.java:handleConnectionFailure(933)) - Retrying connect to server: 
localhost/127.0.0.1:51183. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-17 11:46:02,467 [Datanode State Machine Thread - 0] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state REGISTER
2017-07-17 11:46:02,468 [Datanode State Machine Thread - 1] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state HEARTBEAT
2017-07-17 11:46:02,469 [Datanode State Machine Thread - 2] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN
2017-07-17 11:46:02,471 [Datanode State Machine Thread - 3] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN
{noformat}

Instead of transiting to state {{HEARTBEAT}}, it transited to {{SHUTDOWN}}.

You might have noticed the patch changes some more code than just adding a 
test, that is because the reason I mentioned earlier. I also have added a 
method to check if a datanode is registered to scm so that we can check 
datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks


was (Author: cheersyang):
Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

Step 4 will print log

{noformat}
2017-07-17 11:46:02,451 [Datanode State Machine Thread - 0] INFO  ipc.Client 
(Client.java:handleConnectionFailure(933)) - Retrying connect to server: 
localhost/127.0.0.1:51183. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-17 11:46:02,467 [Datanode State Machine Thread - 0] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state REGISTER
2017-07-17 11:46:02,468 [Datanode State Machine Thread - 1] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state HEARTBEAT
2017-07-17 11:46:02,469 [Datanode State Machine Thread - 2] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN
2017-07-17 11:46:02,471 [Datanode State Machine Thread - 3] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN
2017-07-17 11:46:03,457 [Datanode State Machine Thread - 0] INFO  
statemachine.DatanodeStateMachine 
(DatanodeStateMachine.java:lambda$startDaemon$0(272))  - Ozone container 
server started.
{noformat}

if you apply this patch, it's gonna to fail. You might have noticed the patch 
changes some more code than just adding a test, that is because the reason I 
mentioned earlier. I also have added a method to check if a datanode is 
registered to scm so that we can check datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>

[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283
 ] 

Weiwei Yang edited comment on HDFS-12098 at 7/17/17 3:59 AM:
-

Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

Step 4 will print log

{noformat}
2017-07-17 11:46:02,451 [Datanode State Machine Thread - 0] INFO  ipc.Client 
(Client.java:handleConnectionFailure(933)) - Retrying connect to server: 
localhost/127.0.0.1:51183. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-07-17 11:46:02,467 [Datanode State Machine Thread - 0] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state REGISTER
2017-07-17 11:46:02,468 [Datanode State Machine Thread - 1] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state HEARTBEAT
2017-07-17 11:46:02,469 [Datanode State Machine Thread - 2] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN
2017-07-17 11:46:02,471 [Datanode State Machine Thread - 3] INFO  
endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61))  - Version 
endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN
2017-07-17 11:46:03,457 [Datanode State Machine Thread - 0] INFO  
statemachine.DatanodeStateMachine 
(DatanodeStateMachine.java:lambda$startDaemon$0(272))  - Ozone container 
server started.
{noformat}

if you apply this patch, it's gonna to fail. You might have noticed the patch 
changes some more code than just adding a test, that is because the reason I 
mentioned earlier. I also have added a method to check if a datanode is 
registered to scm so that we can check datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks


was (Author: cheersyang):
Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

if you apply this patch, it's gonna to fail. You might have noticed the patch 
changes some more code than just adding a test, that is because the reason I 
mentioned earlier. I also have added a method to check if a datanode is 
registered to scm so that we can check datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> 

[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12098:
---
Attachment: (was: HDFS-12098-HDFS-7240.testcase.patch)

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png, 
> thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283
 ] 

Weiwei Yang edited comment on HDFS-12098 at 7/17/17 3:58 AM:
-

Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

if you apply this patch, it's gonna to fail. You might have noticed the patch 
changes some more code than just adding a test, that is because the reason I 
mentioned earlier. I also have added a method to check if a datanode is 
registered to scm so that we can check datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks


was (Author: cheersyang):
Attached a test case patch to reproduce this issue. Please take a look at 
[^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario

# Start mini ozone cluster without starting scm
# Datanode is unable to register to scm
# Start scm, waiting for datanode to register
# Wait a while but datanode is still unable to successfully register to scm

if you apply this patch, it's gonna fail. You might have noticed the patch 
changes some more code than just adding a test, that is because the reason I 
mentioned earlier. I also have added a method to check if a datanode is 
registered to scm so that we can check datanode state even scm is not started.

I have a patch to fix this also, if applied that patch, this test will pass. I 
am  ready to share that as well.

Thanks

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> 

[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later

2017-07-16 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12098:
---
Status: Patch Available  (was: In Progress)

> Ozone: Datanode is unable to register with scm if scm starts later
> --
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ozone, scm
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, 
> HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen 
> Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: 
> ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry 
> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 
> SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: 
> af22862d-aafa-4941-9073-53224ae43e2c Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode 
> state was transited SHUTDOWN unexpectedly because the thread leaks, each of 
> those threads counted to set to next state and they all set to SHUTDOWN 
> state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing 
> command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
>  Unable to create container while in chill mode
>   at 
> org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
>   at 
> org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container 
> from CLI without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-07-15 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12069:
---
Attachment: HDFS-12069-HDFS-7240.013.patch

> Ozone: Create a general abstraction for metadata store
> --
>
> Key: HDFS-12069
> URL: https://issues.apache.org/jira/browse/HDFS-12069
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12069-HDFS-7240.001.patch, 
> HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, 
> HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, 
> HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, 
> HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, 
> HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, 
> HDFS-12069-HDFS-7240.012.patch, HDFS-12069-HDFS-7240.013.patch
>
>
> Create a general abstraction for metadata store so that we can plug other key 
> value store to host ozone metadata. Currently only levelDB is implemented, we 
> want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12145) Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names

2017-07-15 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088651#comment-16088651
 ] 

Weiwei Yang commented on HDFS-12145:


Hi [~msingh]

Thanks for working on this, the patch overall looks good. Few comments

1. Can you dump the container.db and make sure its database key-value are 
expected? I am expecting keys are still raw-key-names, and values {{KeyData}} 
should contain a list of correct chunk names. This requires no code change, 
just want to make sure db info is accurate.

2. In {{TestKeys}}, can you add a key name argument in {{PutHelper}}'s 
constructor? So that we can parameterize this class to run with different key 
names. E.g

{code}
new PutHelper(ozoneRestClient, path, "a");
new PutHelper(ozoneRestClient, path, "a/b/c");
new PutHelper(ozoneRestClient, path, "a//b");
{code}

this can be reused in feature if we need to test more format of key names.

3. In {{TestKeys}}, line 168, is it better to create a random file with 
{{newKeyName}} instead of {{keyNamePart1}} ?

Thank you.

> Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names
> --
>
> Key: HDFS-12145
> URL: https://issues.apache.org/jira/browse/HDFS-12145
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12145-HDFS-7240.001.patch, 
> HDFS-12145-HDFS-7240.002.patch
>
>
> With OzoneFileSystem, key names will be delimited by "/" which is used as the 
> path separator.
> Support should be added in KSM and Ozone to support key name with "/"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-16 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12148:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

> Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has 
> some missing properties
> 
>
> Key: HDFS-12148
> URL: https://issues.apache.org/jira/browse/HDFS-12148
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Fix For: HDFS-7240
>
> Attachments: HDFS-12148-HDFS-7240.001.patch
>
>
> Following properties added by HDFS-11493 is missing in ozone-default.xml
> {noformat}
> ozone.scm.max.container.report.threads
> ozone.scm.container.report.processing.interval.seconds
> ozone.scm.container.reports.wait.timeout.seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094349#comment-16094349
 ] 

Weiwei Yang commented on HDFS-12149:


Hi [~yuanbo]

Thanks for helping to review this. I have fixed the close code as you 
suggesgted.

bq. line 267: I don't have much experience in RocksDB, what if iterator doesn't 
have next or prev?

In rocksDB, we can use next() plus isValid() combination instead, slightly 
different with leveldb.

Thank you.

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch, 
> HDFS-12149-HDFS-7240.002.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-20 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12149:
---
Attachment: HDFS-12149-HDFS-7240.003.patch

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch, 
> HDFS-12149-HDFS-7240.002.patch, HDFS-12149-HDFS-7240.003.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094728#comment-16094728
 ] 

Weiwei Yang commented on HDFS-12167:


See error message

{noformat}
Error Message

Unable to find the container. Name: c0@]=[3/~C"8
Stacktrace

org.apache.hadoop.scm.container.common.helpers.StorageContainerException: 
Unable to find the container. Name: c0@]=[3/~C"8
at 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainer(ContainerManagerImpl.java:486)
at 
org.apache.hadoop.ozone.container.common.impl.ChunkManagerImpl.writeChunk(ChunkManagerImpl.java:80)
at 
org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.writeChunkHelper(TestContainerPersistence.java:373)
at 
org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.writeKeyHelper(TestContainerPersistence.java:809)
at 
org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.testListKey(TestContainerPersistence.java:825)
Standard Output

2017-07-20 10:02:34,116 [Thread-13] INFO  impl.ContainerManagerImpl 
(ContainerManagerImpl.java:init(149))  - Loading containers under 
[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/TestContainerPersistence/tmp/ozone/repository
2017-07-20 10:02:34,122 [Thread-13] WARN  fs.CachingGetSpaceUsed 
(DU.java:refresh(55)) - Could not get disk usage information for path 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/4/dfs/data
java.io.IOException: Expecting a line not the end of stream
at org.apache.hadoop.fs.DU$DUShell.parseExecResult(DU.java:79)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:980)
at org.apache.hadoop.util.Shell.run(Shell.java:887)
at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:62)
at org.apache.hadoop.fs.DU.refresh(DU.java:53)
at 
org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:87)
at 
org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerStorageLocation.(ContainerStorageLocation.java:73)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerLocationManagerImpl.(ContainerLocationManagerImpl.java:67)
at 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:168)
at 
org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.setupPaths(TestContainerPersistence.java:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
2017-07-20 10:02:34,132 [Thread-13] INFO  impl.TestContainerPersistence 
(TestContainerPersistence.java:cleanupDir(152))  - Deletting 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/TestContainerPersistence/tmp/ozone
{noformat}

[https://builds.apache.org/job/PreCommit-HDFS-Build/20346/testReport/org.apache.hadoop.ozone.container.common.impl/TestContainerPersistence/testListKey/]

> Ozone: Intermittent failure TestContainerPersistence#testListKey
> 
>
> Key: HDFS-12167
> URL: https://issues.apache.org/jira/browse/HDFS-12167
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
>Reporter: Weiwei Yang
>Priority: Minor
>
> TestContainerPersistence#listKeys seems to fail intermittently.  It looks 
> like it was failing because some unexpected format of container name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey

2017-07-20 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12167:
--

 Summary: Ozone: Intermittent failure 
TestContainerPersistence#testListKey
 Key: HDFS-12167
 URL: https://issues.apache.org/jira/browse/HDFS-12167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, test
Reporter: Weiwei Yang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey

2017-07-20 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12167:
---
Release Note:   (was: TestContainerPersistence#listKeys seems to fail 
intermittently.  It looks like it was failing because some unexpected format of 
container name.)

> Ozone: Intermittent failure TestContainerPersistence#testListKey
> 
>
> Key: HDFS-12167
> URL: https://issues.apache.org/jira/browse/HDFS-12167
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
>Reporter: Weiwei Yang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey

2017-07-20 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12167:
---
Description: TestContainerPersistence#listKeys seems to fail 
intermittently.  It looks like it was failing because some unexpected format of 
container name.

> Ozone: Intermittent failure TestContainerPersistence#testListKey
> 
>
> Key: HDFS-12167
> URL: https://issues.apache.org/jira/browse/HDFS-12167
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
>Reporter: Weiwei Yang
>Priority: Minor
>
> TestContainerPersistence#listKeys seems to fail intermittently.  It looks 
> like it was failing because some unexpected format of container name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094739#comment-16094739
 ] 

Weiwei Yang commented on HDFS-12149:


The UT failure seems not related, I have opened HDFS-12167 to track that issue.

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch, 
> HDFS-12149-HDFS-7240.002.patch, HDFS-12149-HDFS-7240.003.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands

2017-07-19 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092633#comment-16092633
 ] 

Weiwei Yang commented on HDFS-12126:


+1, I will commit this soon. Thanks [~linyiqun]!

> Ozone: Ozone shell: Add more testing for bucket shell commands
> --
>
> Key: HDFS-12126
> URL: https://issues.apache.org/jira/browse/HDFS-12126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12126-HDFS-7240.001.patch, 
> HDFS-12126-HDFS-7240.002.patch, HDFS-12126-HDFS-7240.003.patch
>
>
> Adding more unit tests for ozone bucket commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands

2017-07-19 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12126:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

> Ozone: Ozone shell: Add more testing for bucket shell commands
> --
>
> Key: HDFS-12126
> URL: https://issues.apache.org/jira/browse/HDFS-12126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: HDFS-7240
>
> Attachments: HDFS-12126-HDFS-7240.001.patch, 
> HDFS-12126-HDFS-7240.002.patch, HDFS-12126-HDFS-7240.003.patch
>
>
> Adding more unit tests for ozone bucket commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands

2017-07-19 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092639#comment-16092639
 ] 

Weiwei Yang commented on HDFS-12126:


Just committed to the feature branch, thanks [~linyiqun] for the contribution.

> Ozone: Ozone shell: Add more testing for bucket shell commands
> --
>
> Key: HDFS-12126
> URL: https://issues.apache.org/jira/browse/HDFS-12126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: HDFS-7240
>
> Attachments: HDFS-12126-HDFS-7240.001.patch, 
> HDFS-12126-HDFS-7240.002.patch, HDFS-12126-HDFS-7240.003.patch
>
>
> Adding more unit tests for ozone bucket commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11984) Ozone: Ensures listKey lists all required key fields

2017-07-18 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned HDFS-11984:
--

Assignee: Weiwei Yang

> Ozone: Ensures listKey lists all required key fields
> 
>
> Key: HDFS-11984
> URL: https://issues.apache.org/jira/browse/HDFS-11984
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> HDFS-11782 implements the listKey operation which only lists the basic key 
> fields, we need to make sure it return all required fields
> # version
> # md5hash
> # createdOn
> # size
> # keyName
> # dataFileName
> this task is depending on the work of HDFS-11886. See more discussion [here | 
> https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094774#comment-16094774
 ] 

Weiwei Yang commented on HDFS-12127:


Hi [~linyiqun]

Thanks a lot to add the test case and fix those bugs!  The patch looks really 
good.This helps a lot.
Just one minor comment to v2 patch, might be a bit picky :p

*KeyManagerImpl*

Line 112 to 116: can we change it to

{code}
try { 
  ...
}  catch (KSMException e) {
  throw e;
} catch (IOException e) {
  ...
}
{code}

Thanks a lot!

> Ozone: Ozone shell: Add more testing for key shell commands
> ---
>
> Key: HDFS-12127
> URL: https://issues.apache.org/jira/browse/HDFS-12127
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12127-HDFS-7240.001.patch, 
> HDFS-12127-HDFS-7240.002.patch
>
>
> Adding more unit tests for ozone key commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks

2017-07-19 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned HDFS-11922:
--

Assignee: Weiwei Yang

> Ozone: KSM: Garbage collect deleted blocks
> --
>
> Key: HDFS-11922
> URL: https://issues.apache.org/jira/browse/HDFS-11922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
>Priority: Critical
>
> We need to garbage collect deleted blocks from the Datanodes. There are two 
> cases where we will have orphaned blocks. One is like the classical HDFS, 
> where someone deletes a key and we need to delete the corresponding blocks.
> Another case, is when someone overwrites a key -- an overwrite can be treated 
> as a delete and a new put -- that means that older blocks need to be GC-ed at 
> some point of time. 
> Couple of JIRAs has discussed this in one form or another -- so consolidating 
> all those discussions in this JIRA. 
> HDFS-11796 -- needs to fix this issue for some tests to pass 
> HDFS-11780 -- changed the old overwriting behavior to not supporting this 
> feature for time being.
> HDFS-11920 - Once again runs into this issue when user tries to put an 
> existing key.
> HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on 
> GC for Datanodes. 
> When we solve this issue, we should also consider 2 more aspects. 
> One, we support versioning in the buckets, tracking which blocks are really 
> orphaned is something that KSM will do. So delete and overwrite at some point 
> needs to decide how to handle versioning of buckets.
> Two, If a key exists in a closed container, then it is immutable, hence the 
> strategy of removing the key might be more complex than just talking to an 
> open container.
> cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], 
> [~szetszwo], [~nandakumar131]
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.

2017-07-20 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned HDFS-12176:
--

Assignee: Weiwei Yang

> dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
> -
>
> Key: HDFS-12176
> URL: https://issues.apache.org/jira/browse/HDFS-12176
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Weiwei Yang
>Priority: Trivial
>
> This is rather a non-issue, but thought I should file it anyway.
> I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin 
> shows:
> {noformat}
> $ hdfs dfsadmin -report
> Configured Capacity: 0 (0 B)
> Present Capacity: 0 (0 B)
> DFS Remaining: 0 (0 B)
> DFS Used: 0 (0 B)
> DFS Used%: NaN%
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095698#comment-16095698
 ] 

Weiwei Yang commented on HDFS-12176:


Hi [~jojochuang]

I did a quick check. NaN was from {{0/(double)0}}, if {{presentCapacity}} is 0, 
it should directly return 0, instead of letting it divide by zero. Let me 
submit a simple patch for this.

> dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
> -
>
> Key: HDFS-12176
> URL: https://issues.apache.org/jira/browse/HDFS-12176
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Priority: Trivial
>
> This is rather a non-issue, but thought I should file it anyway.
> I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin 
> shows:
> {noformat}
> $ hdfs dfsadmin -report
> Configured Capacity: 0 (0 B)
> Present Capacity: 0 (0 B)
> DFS Remaining: 0 (0 B)
> DFS Used: 0 (0 B)
> DFS Used%: NaN%
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.

2017-07-20 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12176:
---
Attachment: HDFS-12176.001.patch

> dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
> -
>
> Key: HDFS-12176
> URL: https://issues.apache.org/jira/browse/HDFS-12176
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: HDFS-12176.001.patch
>
>
> This is rather a non-issue, but thought I should file it anyway.
> I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin 
> shows:
> {noformat}
> $ hdfs dfsadmin -report
> Configured Capacity: 0 (0 B)
> Present Capacity: 0 (0 B)
> DFS Remaining: 0 (0 B)
> DFS Used: 0 (0 B)
> DFS Used%: NaN%
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.

2017-07-20 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12176:
---
Status: Patch Available  (was: Open)

> dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
> -
>
> Key: HDFS-12176
> URL: https://issues.apache.org/jira/browse/HDFS-12176
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: HDFS-12176.001.patch
>
>
> This is rather a non-issue, but thought I should file it anyway.
> I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin 
> shows:
> {noformat}
> $ hdfs dfsadmin -report
> Configured Capacity: 0 (0 B)
> Present Capacity: 0 (0 B)
> DFS Remaining: 0 (0 B)
> DFS Used: 0 (0 B)
> DFS Used%: NaN%
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095702#comment-16095702
 ] 

Weiwei Yang commented on HDFS-12149:


Thanks [~anu] !

> Ozone: RocksDB implementation of ozone metadata store
> -
>
> Key: HDFS-12149
> URL: https://issues.apache.org/jira/browse/HDFS-12149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12149-HDFS-7240.001.patch, 
> HDFS-12149-HDFS-7240.002.patch, HDFS-12149-HDFS-7240.003.patch
>
>
> HDFS-12069 added a general interface for ozone metadata store, we already 
> have a leveldb implementation, this JIRA is to track the work of rocksdb 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12071) Ozone: Corona: Implementation of Corona

2017-07-20 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095703#comment-16095703
 ] 

Weiwei Yang commented on HDFS-12071:


Nice work guys, any document how to run corona? I would like to try this. 
Thanks.

> Ozone: Corona: Implementation of Corona
> ---
>
> Key: HDFS-12071
> URL: https://issues.apache.org/jira/browse/HDFS-12071
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Attachments: HDFS-12071-HDFS-7240.000.patch, 
> HDFS-12071-HDFS-7240.001.patch, HDFS-12071-HDFS-7240.002.patch
>
>
> Tool to populate ozone with data for testing.
> This is not a map-reduce program and this is not for benchmarking Ozone write 
> throughput.
> It supports both online and offline modes. Default mode is offline, {{-mode}} 
> can be used to change the mode.
>  
> In online mode, active internet connection is required, common crawl data 
> from AWS will be used. Default source is [CC-MAIN-2017-17/warc.paths.gz | 
> https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz]
>  (it contains the path to actual data segment), user can override this using 
> {{-source}}.
> The following values are derived from URL of Common Crawl data
> * Domain will be used as Volume
> * URL will be used as Bucket
> * FileName will be used as Key
>  
> In offline mode, the data will be random bytes and size of data will be 10 KB.
> * Default number of Volumes 10, {{-numOfVolumes}} can be used to override 
> * Default number of Buckets per Volume 1000, {{-numOfBuckets}} can be used to 
> override 
> * Default number of Keys per Bucket 50, {{-numOfKeys}} can be used to 
> override 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store

2017-07-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12149:
--

 Summary: Ozone: RocksDB implementation of ozone metadata store
 Key: HDFS-12149
 URL: https://issues.apache.org/jira/browse/HDFS-12149
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


HDFS-12069 added a general interface for ozone metadata store, we already have 
a leveldb implementation, this JIRA is to track the work of rocksdb 
implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-15 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088774#comment-16088774
 ] 

Weiwei Yang commented on HDFS-12148:


+[~anu]
I have added missing properties to ozone-default.xml. Please kindly review the 
description if they are accurate, feel free to modify the description. Thanks!

> Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has 
> some missing properties
> 
>
> Key: HDFS-12148
> URL: https://issues.apache.org/jira/browse/HDFS-12148
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12148-HDFS-7240.001.patch
>
>
> Following properties added by HDFS-11493 is missing in ozone-default.xml
> {noformat}
> ozone.scm.max.container.report.threads
> ozone.scm.container.report.processing.interval.seconds
> ozone.scm.container.reports.wait.timeout.seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-15 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12148:
---
Status: Patch Available  (was: Open)

> Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has 
> some missing properties
> 
>
> Key: HDFS-12148
> URL: https://issues.apache.org/jira/browse/HDFS-12148
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12148-HDFS-7240.001.patch
>
>
> Following properties added by HDFS-11493 is missing in ozone-default.xml
> {noformat}
> ozone.scm.max.container.report.threads
> ozone.scm.container.report.processing.interval.seconds
> ozone.scm.container.reports.wait.timeout.seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-15 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12148:
---
Attachment: HDFS-12148-HDFS-7240.001.patch

> Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has 
> some missing properties
> 
>
> Key: HDFS-12148
> URL: https://issues.apache.org/jira/browse/HDFS-12148
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: HDFS-12148-HDFS-7240.001.patch
>
>
> Following properties added by HDFS-11493 is missing in ozone-default.xml
> {noformat}
> ozone.scm.max.container.report.threads
> ozone.scm.container.report.processing.interval.seconds
> ozone.scm.container.reports.wait.timeout.seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-07-15 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12069:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

I have committed this to the feature branch, thanks [~anu], [~xyao], [~yuanbo] 
and [~msingh] for the reviews. Thanks a lot.

> Ozone: Create a general abstraction for metadata store
> --
>
> Key: HDFS-12069
> URL: https://issues.apache.org/jira/browse/HDFS-12069
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Fix For: HDFS-7240
>
> Attachments: HDFS-12069-HDFS-7240.001.patch, 
> HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, 
> HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, 
> HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, 
> HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, 
> HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, 
> HDFS-12069-HDFS-7240.012.patch, HDFS-12069-HDFS-7240.013.patch
>
>
> Create a general abstraction for metadata store so that we can plug other key 
> value store to host ozone metadata. Currently only levelDB is implemented, we 
> want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12069) Ozone: Create a general abstraction for metadata store

2017-07-15 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088767#comment-16088767
 ] 

Weiwei Yang commented on HDFS-12069:


UT failures are not related. I have test locally {{TestDatanodeStateMachine}} 
seems work. {{TestOzoneConfigurationFields}} is caused by HDFS-11493, will 
create a JIRA to get that fixed. {{TestContainerReplicationManager}} is failing 
with or without this patch. I am going to commit this soon.

> Ozone: Create a general abstraction for metadata store
> --
>
> Key: HDFS-12069
> URL: https://issues.apache.org/jira/browse/HDFS-12069
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-12069-HDFS-7240.001.patch, 
> HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, 
> HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, 
> HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, 
> HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, 
> HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, 
> HDFS-12069-HDFS-7240.012.patch, HDFS-12069-HDFS-7240.013.patch
>
>
> Create a general abstraction for metadata store so that we can plug other key 
> value store to host ozone metadata. Currently only levelDB is implemented, we 
> want to support RocksDB as it provides more production ready features.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties

2017-07-15 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12148:
--

 Summary: Ozone: TestOzoneConfigurationFields is failing because 
ozone-default.xml has some missing properties
 Key: HDFS-12148
 URL: https://issues.apache.org/jira/browse/HDFS-12148
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Minor


Following properties added by HDFS-11493 is missing in ozone-default.xml

{noformat}
ozone.scm.max.container.report.threads
ozone.scm.container.report.processing.interval.seconds
ozone.scm.container.reports.wait.timeout.seconds
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12071) Ozone: Corona: Implementation of Corona

2017-07-21 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12071:
---
Fix Version/s: HDFS-7240

> Ozone: Corona: Implementation of Corona
> ---
>
> Key: HDFS-12071
> URL: https://issues.apache.org/jira/browse/HDFS-12071
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
> Fix For: HDFS-7240
>
> Attachments: HDFS-12071-HDFS-7240.000.patch, 
> HDFS-12071-HDFS-7240.001.patch, HDFS-12071-HDFS-7240.002.patch
>
>
> Tool to populate ozone with data for testing.
> This is not a map-reduce program and this is not for benchmarking Ozone write 
> throughput.
> It supports both online and offline modes. Default mode is offline, {{-mode}} 
> can be used to change the mode.
>  
> In online mode, active internet connection is required, common crawl data 
> from AWS will be used. Default source is [CC-MAIN-2017-17/warc.paths.gz | 
> https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz]
>  (it contains the path to actual data segment), user can override this using 
> {{-source}}.
> The following values are derived from URL of Common Crawl data
> * Domain will be used as Volume
> * URL will be used as Bucket
> * FileName will be used as Key
>  
> In offline mode, the data will be random bytes and size of data will be 10 KB.
> * Default number of Volumes 10, {{-numOfVolumes}} can be used to override 
> * Default number of Buckets per Volume 1000, {{-numOfBuckets}} can be used to 
> override 
> * Default number of Keys per Bucket 50, {{-numOfKeys}} can be used to 
> override 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands

2017-07-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095959#comment-16095959
 ] 

Weiwei Yang commented on HDFS-12127:


Hmm, there seems a lot more UT are failing, I can't tell if they are caused by 
this patch (probably not) or the latest trunk merge (probably yes), but could 
you please confirm? If trunk merge causes those problems, we will need another 
JIRA to track.

> Ozone: Ozone shell: Add more testing for key shell commands
> ---
>
> Key: HDFS-12127
> URL: https://issues.apache.org/jira/browse/HDFS-12127
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12127-HDFS-7240.001.patch, 
> HDFS-12127-HDFS-7240.002.patch, HDFS-12127-HDFS-7240.003.patch
>
>
> Adding more unit tests for ozone key commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11936) Ozone: TestNodeManager times out before it is able to find all nodes

2017-07-21 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-11936:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

Just committed to the feature branch, thanks for the contribution [~yuanbo]

> Ozone: TestNodeManager times out before it is able to find all nodes
> 
>
> Key: HDFS-11936
> URL: https://issues.apache.org/jira/browse/HDFS-11936
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Yuanbo Liu
> Fix For: HDFS-7240
>
> Attachments: HDFS-11936-HDFS-7240.001.patch, 
> HDFS-11936-HDFS-7240.002.patch
>
>
> During the pre-commit build of 
> https://builds.apache.org/job/PreCommit-HDFS-Build/19795/testReport/
> we detected that a test in TestNodeManager is failing. Probably due to the
> fact that we need more time to execute this test in jenkins. This might be 
> related to HDFS-11919
> The test failure report follows.
> ==
> {noformat}
> Regression
> org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport
> Failing for the past 1 build (Since Failed#19795 )
> Took 0.51 sec.
> Error Message
> expected:<2> but was:<18000>
> Stacktrace
> java.lang.AssertionError: expected:<2> but was:<18000>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport(TestNodeManager.java:972)
> Standard Output
> 2017-06-06 13:45:30,909 [main] INFO   - Data node with ID: 
> 732ebd32-a926-44c5-afbb-c9f87513a67c Registered.
> 2017-06-06 13:45:30,937 [main] INFO   - Data node with ID: 
> 6860fd5d-94dc-4ba8-acd0-41cc3fa7232d Registered.
> 2017-06-06 13:45:30,971 [main] INFO   - Data node with ID: 
> cad7174c-204c-4806-b3af-c874706d4bd9 Registered.
> 2017-06-06 13:45:30,996 [main] INFO   - Data node with ID: 
> 0130a672-719d-4b68-9a1e-13046f4281ff Registered.
> 2017-06-06 13:45:31,021 [main] INFO   - Data node with ID: 
> 8d9ea5d4-6752-48d4-9bf0-adb0bd1a651a Registered.
> 2017-06-06 13:45:31,046 [main] INFO   - Data node with ID: 
> f122e372-5a38-476b-97dc-5ae449190485 Registered.
> 2017-06-06 13:45:31,071 [main] INFO   - Data node with ID: 
> 5750eb03-c1ac-4b3a-bc59-c4d9481e245b Registered.
> 2017-06-06 13:45:31,097 [main] INFO   - Data node with ID: 
> aa2d90a1-9e85-41f8-a4e5-35c7d2ed7299 Registered.
> 2017-06-06 13:45:31,122 [main] INFO   - Data node with ID: 
> 5e52bf5c-7050-4fc9-bf10-0e52650229ee Registered.
> 2017-06-06 13:45:31,147 [main] INFO   - Data node with ID: 
> eaac7b8f-a556-4afc-9163-7309f7ccea18 Registered.
> 2017-06-06 13:45:31,224 [SCM Heartbeat Processing Thread - 0] INFO   - 
> Current Thread is interrupted, shutting down HB processing thread for Node 
> Manager.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12115) Ozone: SCM: Add queryNode RPC Call

2017-07-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095915#comment-16095915
 ] 

Weiwei Yang commented on HDFS-12115:


Hi [~anu]

UT failure {{testCapacityPlacementYieldsBetterDataDistribution}} seems related, 
can we get that fixed before committing this? And there seems to have a blank 
line at EOF in your v7 patch line 201, could you please remove that as well?

Thanks

> Ozone: SCM: Add queryNode RPC Call
> --
>
> Key: HDFS-12115
> URL: https://issues.apache.org/jira/browse/HDFS-12115
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-12115-HDFS-7240.001.patch, 
> HDFS-12115-HDFS-7240.002.patch, HDFS-12115-HDFS-7240.003.patch, 
> HDFS-12115-HDFS-7240.004.patch, HDFS-12115-HDFS-7240.005.patch, 
> HDFS-12115-HDFS-7240.006.patch, HDFS-12115-HDFS-7240.007.patch
>
>
> Add queryNode RPC to Storage container location protocol. This allows 
> applications like SCM CLI to get the list of nodes in various states, like 
> Healthy, live or Dead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12071) Ozone: Corona: Implementation of Corona

2017-07-21 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12071:
---
Labels: tool  (was: )

> Ozone: Corona: Implementation of Corona
> ---
>
> Key: HDFS-12071
> URL: https://issues.apache.org/jira/browse/HDFS-12071
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nandakumar
>Assignee: Nandakumar
>  Labels: tool
> Fix For: HDFS-7240
>
> Attachments: HDFS-12071-HDFS-7240.000.patch, 
> HDFS-12071-HDFS-7240.001.patch, HDFS-12071-HDFS-7240.002.patch
>
>
> Tool to populate ozone with data for testing.
> This is not a map-reduce program and this is not for benchmarking Ozone write 
> throughput.
> It supports both online and offline modes. Default mode is offline, {{-mode}} 
> can be used to change the mode.
>  
> In online mode, active internet connection is required, common crawl data 
> from AWS will be used. Default source is [CC-MAIN-2017-17/warc.paths.gz | 
> https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz]
>  (it contains the path to actual data segment), user can override this using 
> {{-source}}.
> The following values are derived from URL of Common Crawl data
> * Domain will be used as Volume
> * URL will be used as Bucket
> * FileName will be used as Key
>  
> In offline mode, the data will be random bytes and size of data will be 10 KB.
> * Default number of Volumes 10, {{-numOfVolumes}} can be used to override 
> * Default number of Buckets per Volume 1000, {{-numOfBuckets}} can be used to 
> override 
> * Default number of Keys per Bucket 50, {{-numOfKeys}} can be used to 
> override 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11936) Ozone: TestNodeManager times out before it is able to find all nodes

2017-07-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095908#comment-16095908
 ] 

Weiwei Yang commented on HDFS-11936:


Makes sense to me, 100ms HB interval here creates race condition, increase to 
1s makes sense to me. Thanks [~yuanbo], I am going to commit this soon.

> Ozone: TestNodeManager times out before it is able to find all nodes
> 
>
> Key: HDFS-11936
> URL: https://issues.apache.org/jira/browse/HDFS-11936
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Yuanbo Liu
> Attachments: HDFS-11936-HDFS-7240.001.patch, 
> HDFS-11936-HDFS-7240.002.patch
>
>
> During the pre-commit build of 
> https://builds.apache.org/job/PreCommit-HDFS-Build/19795/testReport/
> we detected that a test in TestNodeManager is failing. Probably due to the
> fact that we need more time to execute this test in jenkins. This might be 
> related to HDFS-11919
> The test failure report follows.
> ==
> {noformat}
> Regression
> org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport
> Failing for the past 1 build (Since Failed#19795 )
> Took 0.51 sec.
> Error Message
> expected:<2> but was:<18000>
> Stacktrace
> java.lang.AssertionError: expected:<2> but was:<18000>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport(TestNodeManager.java:972)
> Standard Output
> 2017-06-06 13:45:30,909 [main] INFO   - Data node with ID: 
> 732ebd32-a926-44c5-afbb-c9f87513a67c Registered.
> 2017-06-06 13:45:30,937 [main] INFO   - Data node with ID: 
> 6860fd5d-94dc-4ba8-acd0-41cc3fa7232d Registered.
> 2017-06-06 13:45:30,971 [main] INFO   - Data node with ID: 
> cad7174c-204c-4806-b3af-c874706d4bd9 Registered.
> 2017-06-06 13:45:30,996 [main] INFO   - Data node with ID: 
> 0130a672-719d-4b68-9a1e-13046f4281ff Registered.
> 2017-06-06 13:45:31,021 [main] INFO   - Data node with ID: 
> 8d9ea5d4-6752-48d4-9bf0-adb0bd1a651a Registered.
> 2017-06-06 13:45:31,046 [main] INFO   - Data node with ID: 
> f122e372-5a38-476b-97dc-5ae449190485 Registered.
> 2017-06-06 13:45:31,071 [main] INFO   - Data node with ID: 
> 5750eb03-c1ac-4b3a-bc59-c4d9481e245b Registered.
> 2017-06-06 13:45:31,097 [main] INFO   - Data node with ID: 
> aa2d90a1-9e85-41f8-a4e5-35c7d2ed7299 Registered.
> 2017-06-06 13:45:31,122 [main] INFO   - Data node with ID: 
> 5e52bf5c-7050-4fc9-bf10-0e52650229ee Registered.
> 2017-06-06 13:45:31,147 [main] INFO   - Data node with ID: 
> eaac7b8f-a556-4afc-9163-7309f7ccea18 Registered.
> 2017-06-06 13:45:31,224 [SCM Heartbeat Processing Thread - 0] INFO   - 
> Current Thread is interrupted, shutting down HB processing thread for Node 
> Manager.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands

2017-07-21 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12127:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

> Ozone: Ozone shell: Add more testing for key shell commands
> ---
>
> Key: HDFS-12127
> URL: https://issues.apache.org/jira/browse/HDFS-12127
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: HDFS-7240
>
> Attachments: HDFS-12127-HDFS-7240.001.patch, 
> HDFS-12127-HDFS-7240.002.patch, HDFS-12127-HDFS-7240.003.patch
>
>
> Adding more unit tests for ozone key commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands

2017-07-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096190#comment-16096190
 ] 

Weiwei Yang commented on HDFS-12127:


Looks good, +1. I am going to commit this shortly. Thanks [~linyiqun] to 
confirm this!

> Ozone: Ozone shell: Add more testing for key shell commands
> ---
>
> Key: HDFS-12127
> URL: https://issues.apache.org/jira/browse/HDFS-12127
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, tools
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-12127-HDFS-7240.001.patch, 
> HDFS-12127-HDFS-7240.002.patch, HDFS-12127-HDFS-7240.003.patch
>
>
> Adding more unit tests for ozone key commands, similar to HDFS-12118.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db

2017-07-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097045#comment-16097045
 ] 

Weiwei Yang commented on HDFS-12187:


TestKSMSQLCli.java is missing apache license header, please get that fixed 
before committing.

> Ozone : add support to DEBUG CLI for ksm.db
> ---
>
> Key: HDFS-12187
> URL: https://issues.apache.org/jira/browse/HDFS-12187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12187-HDFS-7240.001.patch
>
>
> This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite 
> db.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12115) Ozone: SCM: Add queryNode RPC Call

2017-07-21 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097043#comment-16097043
 ] 

Weiwei Yang commented on HDFS-12115:


Hi [~anu] Guess you need to rebase you latest patch to latest code base. :P

> Ozone: SCM: Add queryNode RPC Call
> --
>
> Key: HDFS-12115
> URL: https://issues.apache.org/jira/browse/HDFS-12115
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-12115-HDFS-7240.001.patch, 
> HDFS-12115-HDFS-7240.002.patch, HDFS-12115-HDFS-7240.003.patch, 
> HDFS-12115-HDFS-7240.004.patch, HDFS-12115-HDFS-7240.005.patch, 
> HDFS-12115-HDFS-7240.006.patch, HDFS-12115-HDFS-7240.007.patch, 
> HDFS-12115-HDFS-7240.008.patch
>
>
> Add queryNode RPC to Storage container location protocol. This allows 
> applications like SCM CLI to get the list of nodes in various states, like 
> Healthy, live or Dead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12163) Ozone: MiniOzoneCluster uses 400+ threads

2017-07-21 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12163:
---
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-7240

> Ozone: MiniOzoneCluster uses 400+ threads
> -
>
> Key: HDFS-12163
> URL: https://issues.apache.org/jira/browse/HDFS-12163
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Weiwei Yang
> Attachments: TestOzoneThreadCount20170719.patch
>
>
> Checked the number of active threads used in MiniOzoneCluster with various 
> settings:
> - Local handlers
> - Distributed handlers
> - Ratis-Netty
> - Ratis-gRPC
> The results are similar for all the settings.  It uses 400+ threads for an 
> 1-datanode MiniOzoneCluster.
> Moreover, there is a thread leak -- a number of the threads do not shutdown 
> after the test is finished.  Therefore, when tests run consecutively, the 
> later tests use more threads.
> Will post the details in comments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background

2017-07-25 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12196:
---
Description: Implement a recycling service running on datanode to delete 
stale blocks.  The recycling service scans staled blocks for each container and 
delete chunks and references periodically.  (was: Implement a recycling service 
running on datanode to delete stale blocks periodically. )

> Ozone: DeleteKey-2: Implement container recycling service to delete stale 
> blocks at background
> --
>
> Key: HDFS-12196
> URL: https://issues.apache.org/jira/browse/HDFS-12196
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> Implement a recycling service running on datanode to delete stale blocks.  
> The recycling service scans staled blocks for each container and delete 
> chunks and references periodically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks

2017-07-25 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-11922:
---
Attachment: Async delete keys.pdf

> Ozone: KSM: Garbage collect deleted blocks
> --
>
> Key: HDFS-11922
> URL: https://issues.apache.org/jira/browse/HDFS-11922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: Async delete keys.pdf
>
>
> We need to garbage collect deleted blocks from the Datanodes. There are two 
> cases where we will have orphaned blocks. One is like the classical HDFS, 
> where someone deletes a key and we need to delete the corresponding blocks.
> Another case, is when someone overwrites a key -- an overwrite can be treated 
> as a delete and a new put -- that means that older blocks need to be GC-ed at 
> some point of time. 
> Couple of JIRAs has discussed this in one form or another -- so consolidating 
> all those discussions in this JIRA. 
> HDFS-11796 -- needs to fix this issue for some tests to pass 
> HDFS-11780 -- changed the old overwriting behavior to not supporting this 
> feature for time being.
> HDFS-11920 - Once again runs into this issue when user tries to put an 
> existing key.
> HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on 
> GC for Datanodes. 
> When we solve this issue, we should also consider 2 more aspects. 
> One, we support versioning in the buckets, tracking which blocks are really 
> orphaned is something that KSM will do. So delete and overwrite at some point 
> needs to decide how to handle versioning of buckets.
> Two, If a key exists in a closed container, then it is immutable, hence the 
> strategy of removing the key might be more complex than just talking to an 
> open container.
> cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], 
> [~szetszwo], [~nandakumar131]
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks

2017-07-25 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099814#comment-16099814
 ] 

Weiwei Yang commented on HDFS-11922:


Hi [~anu], [~xyao] and folks on cc, I have uploaded a doc about delete key 
implementation based on the discussions we had earlier, please help to review. 
Thanks!

> Ozone: KSM: Garbage collect deleted blocks
> --
>
> Key: HDFS-11922
> URL: https://issues.apache.org/jira/browse/HDFS-11922
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: Async delete keys.pdf
>
>
> We need to garbage collect deleted blocks from the Datanodes. There are two 
> cases where we will have orphaned blocks. One is like the classical HDFS, 
> where someone deletes a key and we need to delete the corresponding blocks.
> Another case, is when someone overwrites a key -- an overwrite can be treated 
> as a delete and a new put -- that means that older blocks need to be GC-ed at 
> some point of time. 
> Couple of JIRAs has discussed this in one form or another -- so consolidating 
> all those discussions in this JIRA. 
> HDFS-11796 -- needs to fix this issue for some tests to pass 
> HDFS-11780 -- changed the old overwriting behavior to not supporting this 
> feature for time being.
> HDFS-11920 - Once again runs into this issue when user tries to put an 
> existing key.
> HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on 
> GC for Datanodes. 
> When we solve this issue, we should also consider 2 more aspects. 
> One, we support versioning in the buckets, tracking which blocks are really 
> orphaned is something that KSM will do. So delete and overwrite at some point 
> needs to decide how to handle versioning of buckets.
> Two, If a key exists in a closed container, then it is immutable, hence the 
> strategy of removing the key might be more complex than just talking to an 
> open container.
> cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], 
> [~szetszwo], [~nandakumar131]
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background

2017-07-25 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-12196 started by Weiwei Yang.
--
> Ozone: DeleteKey-2: Implement container recycling service to delete stale 
> blocks at background
> --
>
> Key: HDFS-12196
> URL: https://issues.apache.org/jira/browse/HDFS-12196
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> Implement a recycling service running on datanode to delete stale blocks 
> periodically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously

2017-07-25 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12195:
--

 Summary: Ozone: DeleteKey-1: KSM replies delete key request 
asynchronously
 Key: HDFS-12195
 URL: https://issues.apache.org/jira/browse/HDFS-12195
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Yuanbo Liu


We will implement delete key in ozone in multiple child tasks, this is 1 of the 
child task to implement client to scm communication. We need to do it in async 
manner, once key state is changed in ksm metadata, ksm is ready to reply client 
with a successful message. Actual deletes on other layers will happen some time 
later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously

2017-07-25 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12195:
---
Attachment: client-ksm.png

> Ozone: DeleteKey-1: KSM replies delete key request asynchronously
> -
>
> Key: HDFS-12195
> URL: https://issues.apache.org/jira/browse/HDFS-12195
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Weiwei Yang
>Assignee: Yuanbo Liu
> Attachments: client-ksm.png
>
>
> We will implement delete key in ozone in multiple child tasks, this is 1 of 
> the child task to implement client to scm communication. We need to do it in 
> async manner, once key state is changed in ksm metadata, ksm is ready to 
> reply client with a successful message. Actual deletes on other layers will 
> happen some time later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background

2017-07-25 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12196:
--

 Summary: Ozone: DeleteKey-2: Implement container recycling service 
to delete stale blocks at background
 Key: HDFS-12196
 URL: https://issues.apache.org/jira/browse/HDFS-12196
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Implement a recycling service running on datanode to delete stale blocks 
periodically. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db

2017-07-24 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099362#comment-16099362
 ] 

Weiwei Yang commented on HDFS-12187:


+1, I am going to commit this shortly. Thanks [~vagarychen].

> Ozone : add support to DEBUG CLI for ksm.db
> ---
>
> Key: HDFS-12187
> URL: https://issues.apache.org/jira/browse/HDFS-12187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12187-HDFS-7240.001.patch, 
> HDFS-12187-HDFS-7240.002.patch
>
>
> This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite 
> db.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12145) Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names

2017-07-24 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12145:
---
Attachment: HDFS-12145-HDFS-7240.006.patch

Hi [~msingh]

Apologies I might not comment clearly, I uploaded a v6 patch based on your v5 
patch. Basically I wanted to get both non-delimited and delimited keys are 
covered by {{TestKeys}} class, please check and let me know if this looks good 
to you.

Thanks a lot.

> Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names
> --
>
> Key: HDFS-12145
> URL: https://issues.apache.org/jira/browse/HDFS-12145
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12145-HDFS-7240.001.patch, 
> HDFS-12145-HDFS-7240.002.patch, HDFS-12145-HDFS-7240.003.patch, 
> HDFS-12145-HDFS-7240.004.patch, HDFS-12145-HDFS-7240.005.patch, 
> HDFS-12145-HDFS-7240.006.patch
>
>
> With OzoneFileSystem, key names will be delimited by "/" which is used as the 
> path separator.
> Support should be added in KSM and Ozone to support key name with "/"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12155) Ozone : add RocksDB support to DEBUG CLI

2017-07-24 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099377#comment-16099377
 ] 

Weiwei Yang commented on HDFS-12155:


Hi [~vagarychen], I just committed HDFS-12187, could you resume your patch for 
this one? Thanks

> Ozone : add RocksDB support to DEBUG CLI
> 
>
> Key: HDFS-12155
> URL: https://issues.apache.org/jira/browse/HDFS-12155
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12155-HDFS-7240.001.patch, 
> HDFS-12155-HDFS-7240.002.patch
>
>
> As we are migrating to replacing LevelDB with RocksDB, we should also add the 
> support of RocksDB to the debug cli.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db

2017-07-24 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-12187:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: HDFS-7240
Target Version/s: HDFS-7240
  Status: Resolved  (was: Patch Available)

I just committed this to the feature branch, thanks a lot for the contribution 
[~vagarychen], and thanks for the review [~anu].

> Ozone : add support to DEBUG CLI for ksm.db
> ---
>
> Key: HDFS-12187
> URL: https://issues.apache.org/jira/browse/HDFS-12187
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: HDFS-7240
>
> Attachments: HDFS-12187-HDFS-7240.001.patch, 
> HDFS-12187-HDFS-7240.002.patch
>
>
> This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite 
> db.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11984) Ozone: Ensures listKey lists all required key fields

2017-07-24 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned HDFS-11984:
--

Assignee: Yiqun Lin  (was: Weiwei Yang)

> Ozone: Ensures listKey lists all required key fields
> 
>
> Key: HDFS-11984
> URL: https://issues.apache.org/jira/browse/HDFS-11984
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Yiqun Lin
>
> HDFS-11782 implements the listKey operation which only lists the basic key 
> fields, we need to make sure it return all required fields
> # version
> # md5hash
> # createdOn
> # size
> # keyName
> this task is depending on the work of HDFS-11886. See more discussion [here | 
> https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11984) Ozone: Ensures listKey lists all required key fields

2017-07-24 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099422#comment-16099422
 ] 

Weiwei Yang commented on HDFS-11984:


Hi [~linyiqun]

Thanks for working on this. You are right, we don't need {{dataFileName}}, let 
me update the description. I listed this one depending on HDFS-11886 that was 
because I thought these info would be persisted only when we commit key 
(phase-2). However HDFS-12170 was implemented while writing a key (phase-1), it 
should be fine for now. We can keep HDFS-11886 open for further improvement on 
this.
 
Meanwhile I will reassign this JIRA to you so you can work on this stuff 
end-to-end, thanks a lot for working on this, again. :).

> Ozone: Ensures listKey lists all required key fields
> 
>
> Key: HDFS-11984
> URL: https://issues.apache.org/jira/browse/HDFS-11984
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> HDFS-11782 implements the listKey operation which only lists the basic key 
> fields, we need to make sure it return all required fields
> # version
> # md5hash
> # createdOn
> # size
> # keyName
> # dataFileName
> this task is depending on the work of HDFS-11886. See more discussion [here | 
> https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11984) Ozone: Ensures listKey lists all required key fields

2017-07-24 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-11984:
---
Description: 
HDFS-11782 implements the listKey operation which only lists the basic key 
fields, we need to make sure it return all required fields

# version
# md5hash
# createdOn
# size
# keyName

this task is depending on the work of HDFS-11886. See more discussion [here | 
https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].

  was:
HDFS-11782 implements the listKey operation which only lists the basic key 
fields, we need to make sure it return all required fields

# version
# md5hash
# createdOn
# size
# keyName
# dataFileName

this task is depending on the work of HDFS-11886. See more discussion [here | 
https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].


> Ozone: Ensures listKey lists all required key fields
> 
>
> Key: HDFS-11984
> URL: https://issues.apache.org/jira/browse/HDFS-11984
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> HDFS-11782 implements the listKey operation which only lists the basic key 
> fields, we need to make sure it return all required fields
> # version
> # md5hash
> # createdOn
> # size
> # keyName
> this task is depending on the work of HDFS-11886. See more discussion [here | 
> https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11920) Ozone : add key partition

2017-07-24 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099438#comment-16099438
 ] 

Weiwei Yang commented on HDFS-11920:


Hi [~vagarychen]

Thanks for the patch, it looks good to me overall. I have few comments please 
let me know if that makes sense to you,

1. *DistributedStorageHandler*

line 410: I am wondering why it is building the containerKey to 
"/volume/bucket/blockID", why not use simply {{BlockID}} here? This seems to be 
the key that written to container.db in container metadata.

2. *ChunkOutputStream*

I am thinking if we really need to let it know about an ozone object key, see 
line 56. Right now it writes a chunk file like 
{{ozoneKeyName_stream_streamId_chunk_n}}, why not  
{{blockId_stream_streamId_chunk_n}} instead? I think we can remove this 
variable from this class.

line 168: it writes {{b}} length to the outputstream but the position only 
moves 1, seems incorrect. 

3. *TestMultipleContainerReadWrite*

In {{TestWriteRead}}, can we check the number of chunk files for the key 
actually matches the desired number of split?

4. Looks like chunk group input or output stream maintains a list of streams 
and r/w in liner manner, can we optimize this to do parallel r/w as they are 
independent chunks. That says to have a thread fetch a certain length of 
content from a chunk, then merge them together afterwards. It doesn't have to 
be done in this patch, but I think that might be a good improvement.

Thanks

> Ozone : add key partition
> -
>
> Key: HDFS-11920
> URL: https://issues.apache.org/jira/browse/HDFS-11920
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-11920-HDFS-7240.001.patch, 
> HDFS-11920-HDFS-7240.002.patch, HDFS-11920-HDFS-7240.003.patch, 
> HDFS-11920-HDFS-7240.004.patch
>
>
> Currently, each key corresponds to one single SCM block, and putKey/getKey 
> writes/reads to this single SCM block. This works fine for keys with 
> reasonably small data size. However if the data is too huge, (e.g. not even 
> fits into a single container), then we need to be able to partition the key 
> data into multiple blocks, each in one container. This JIRA changes the 
> key-related classes to support this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12145) Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names

2017-07-26 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102693#comment-16102693
 ] 

Weiwei Yang commented on HDFS-12145:


Thanks [~msingh] for confirming that, +1 to latest patch, I will commit this 
shortly. Thanks for the updates.

> Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names
> --
>
> Key: HDFS-12145
> URL: https://issues.apache.org/jira/browse/HDFS-12145
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
> Fix For: HDFS-7240
>
> Attachments: HDFS-12145-HDFS-7240.001.patch, 
> HDFS-12145-HDFS-7240.002.patch, HDFS-12145-HDFS-7240.003.patch, 
> HDFS-12145-HDFS-7240.004.patch, HDFS-12145-HDFS-7240.005.patch, 
> HDFS-12145-HDFS-7240.006.patch, HDFS-12145-HDFS-7240.007.patch
>
>
> With OzoneFileSystem, key names will be delimited by "/" which is used as the 
> path separator.
> Support should be added in KSM and Ozone to support key name with "/"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    3   4   5   6   7   8   9   10   11   12   >