[jira] [Updated] (HDFS-12129) Ozone: SCM http server is not stopped with SCM#stop()
[ https://issues.apache.org/jira/browse/HDFS-12129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12129: --- Affects Version/s: HDFS-7240 > Ozone: SCM http server is not stopped with SCM#stop() > - > > Key: HDFS-12129 > URL: https://issues.apache.org/jira/browse/HDFS-12129 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang > > Found this issue while trying to restarting scm, it failed on address already > in use error. This is because the http server is not stopped in stop() method. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12129) Ozone: SCM http server is not stopped with SCM#stop()
[ https://issues.apache.org/jira/browse/HDFS-12129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12129: --- Summary: Ozone: SCM http server is not stopped with SCM#stop() (was: Ozone) > Ozone: SCM http server is not stopped with SCM#stop() > - > > Key: HDFS-12129 > URL: https://issues.apache.org/jira/browse/HDFS-12129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Yang > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12129) Ozone
Weiwei Yang created HDFS-12129: -- Summary: Ozone Key: HDFS-12129 URL: https://issues.apache.org/jira/browse/HDFS-12129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Weiwei Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085463#comment-16085463 ] Weiwei Yang commented on HDFS-12098: Ah found the difference after hours of debugging ... it's not that easy to get this reproduced from mini cluster, let me explain, the behavior is different from mini cluster and a real cluster setup, *Mini Cluster* In class {{MiniOzoneCluster}}, we are initiating SCM like {code} StorageContainerManager scm = new StorageContainerManager(conf); f(!disableSCM) { // start SCM if it is not disabled. scm.start(); } {code} the constructor of scm will init scm datanode, client RPC servers. During the initiation, {{RPC.Builder(conf)...build()}} will bind the RPC server to the specific port, once the port is bound, subsequent client RPC calls e.g {code} SCMVersionResponseProto versionResponse = rpcEndPoint.getEndPoint().getVersion(null); {code} will try to connect that port and read data, however the service is not responding, thus it gets a {{SocketTimeout}}. *Real Cluster* However, in a real cluster environment. Scm constructor will not be called, so the port will not be bound. When the RPC client tries to connect to that port, it gets a {{connection refused error}}. This error is caught and triggered the RetryPolicy, that's where I saw 10 times of retry which causes this problem (thread leak). I am not sure if it is worth to fix this problem in mini cluster, that probably needs to refactor the SCM constructor to move RPC init code out. Since this issue can be simply reproduced in a cluster setup following the steps in the description. Please kindly advise. Thanks. > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png, > thread_dump.log > > > Reproducing steps > # Start datanode > # Wait and see datanode state, it has connection issues, this is expected > # Start SCM, expecting datanode could connect to the scm and the state > machine could transit to RUNNING. However in actual, its state transits to > SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088382#comment-16088382 ] Weiwei Yang commented on HDFS-12098: Hi [~anu] I just uploaded a test case patch to reproduce this problem from UT. I revised some code about how scm was started in MiniOzoneCluster, ensures that scm constructor is only called when scm is started. In this case, I could reproduce the same issue as I was seeing from a real setup. Please take a look and if you are agree with the problem I described, we then can look at the fix. Thank you. > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088411#comment-16088411 ] Weiwei Yang commented on HDFS-12098: Please hold on looking at the test patch, it still has some problems.. working on a new one :P > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12098: --- Status: In Progress (was: Patch Available) > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12098: --- Attachment: HDFS-12098-HDFS-7240.testcase.patch > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12069) Ozone: Create a general abstraction for metadata store
[ https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088615#comment-16088615 ] Weiwei Yang commented on HDFS-12069: Hello [~anu] Thanks for your +1 :). Since it's being a while when v11 patch was uploaded, I just rebased to latest and want to make sure Jenkins is still happy. Will commit if everything is fine. Thanks [~anu] to help to review this. Appreciate! > Ozone: Create a general abstraction for metadata store > -- > > Key: HDFS-12069 > URL: https://issues.apache.org/jira/browse/HDFS-12069 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12069-HDFS-7240.001.patch, > HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, > HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, > HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, > HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, > HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, > HDFS-12069-HDFS-7240.012.patch > > > Create a general abstraction for metadata store so that we can plug other key > value store to host ozone metadata. Currently only levelDB is implemented, we > want to support RocksDB as it provides more production ready features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12069) Ozone: Create a general abstraction for metadata store
[ https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12069: --- Attachment: HDFS-12069-HDFS-7240.012.patch > Ozone: Create a general abstraction for metadata store > -- > > Key: HDFS-12069 > URL: https://issues.apache.org/jira/browse/HDFS-12069 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12069-HDFS-7240.001.patch, > HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, > HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, > HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, > HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, > HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, > HDFS-12069-HDFS-7240.012.patch > > > Create a general abstraction for metadata store so that we can plug other key > value store to host ozone metadata. Currently only levelDB is implemented, we > want to support RocksDB as it provides more production ready features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12147) Ozone: KSM: Add checkBucketAccess
[ https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091335#comment-16091335 ] Weiwei Yang commented on HDFS-12147: Hi [~nandakumar131] Thank you. But even we want to expose them to clients, the API arguments still look odd to me. How would a client to compose an OzoneAcl in the request when it wants to check a certain access? Semantically we often check against an {{User Identity}} and an {{operation}} (e.g read/write/delete). Use this patch, does it work like following? Suppose a bucket has following ACL {noformat} user:bilbo:rw user:john:r user:mike:w {noformat} and a client pass an OzoneAcl like following {{user:mike:w}} this means I want to check if user mike has the write permission to the bucket? And this case it has the access. What if the bucket ACL is like following {noformat} user:bilbo:rw user:john:r group:hadoop:w {noformat} and mike belongs to hadoop group, when I verify {{user:mike:w}}, will it give me an access control exception? > Ozone: KSM: Add checkBucketAccess > - > > Key: HDFS-12147 > URL: https://issues.apache.org/jira/browse/HDFS-12147 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Attachments: HDFS-12147-HDFS-7240.000.patch, > HDFS-12147-HDFS-7240.001.patch > > > Checks if the caller has access to a given bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12154) Incorrect javadoc description in StorageLocationChecker#check
[ https://issues.apache.org/jira/browse/HDFS-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091336#comment-16091336 ] Weiwei Yang commented on HDFS-12154: Looks good to me, +1, committing now. > Incorrect javadoc description in StorageLocationChecker#check > - > > Key: HDFS-12154 > URL: https://issues.apache.org/jira/browse/HDFS-12154 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Nandakumar >Assignee: Nandakumar >Priority: Trivial > Attachments: HDFS-12154.000.patch > > > {{StorageLocationChecker#check}} returns list of healthy volumes, but javadoc > states that it returns failed volumes. > {code} > /** >* Initiate a check of the supplied storage volumes and return >* a list of failed volumes. >* >* StorageLocations are returned in the same order as the input >* for compatibility with existing unit tests. >* >* @param conf HDFS configuration. >* @param dataDirs list of volumes to check. >* @return returns a list of failed volumes. Returns the empty list if >* there are no failed volumes. >* >* @throws InterruptedException if the check was interrupted. >* @throws IOException if the number of failed volumes exceeds the >* maximum allowed or if there are no good >* volumes. >*/ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090940#comment-16090940 ] Weiwei Yang commented on HDFS-12149: Ah posted last comment too quick until I see [~aw]'s comment. bq. basically saying dont commit new code until we have something we can use. does it mean once there is a new rocksdb released which has included the license update, then we can commit this? > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12147) Ozone: KSM: Add checkBucketAccess
[ https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091335#comment-16091335 ] Weiwei Yang edited comment on HDFS-12147 at 7/18/17 9:26 AM: - Hi [~nandakumar131] Thank you. But even we want to expose them to clients, the API arguments still look odd to me. How would a client to compose an OzoneAcl in the request when it wants to check a certain access? Semantically we often check against an {{User Identity}} and an {{operation}} (e.g read/write/delete). Use this patch, does it work like following? Suppose a bucket has following ACL {noformat} user:bilbo:rw user:john:r user:mike:w {noformat} and a client pass an OzoneAcl like following {{user:mike:w}} this means I want to check if user mike has the write permission to the bucket? And this case it has the access. What if the bucket ACL is like following {noformat} user:bilbo:rw user:john:r group:hadoop:w {noformat} and mike belongs to hadoop group, when I verify {{user:mike:w}}, will it give me an access control exception? Forgive me I just want to understand how this works. Thanks a lot. was (Author: cheersyang): Hi [~nandakumar131] Thank you. But even we want to expose them to clients, the API arguments still look odd to me. How would a client to compose an OzoneAcl in the request when it wants to check a certain access? Semantically we often check against an {{User Identity}} and an {{operation}} (e.g read/write/delete). Use this patch, does it work like following? Suppose a bucket has following ACL {noformat} user:bilbo:rw user:john:r user:mike:w {noformat} and a client pass an OzoneAcl like following {{user:mike:w}} this means I want to check if user mike has the write permission to the bucket? And this case it has the access. What if the bucket ACL is like following {noformat} user:bilbo:rw user:john:r group:hadoop:w {noformat} and mike belongs to hadoop group, when I verify {{user:mike:w}}, will it give me an access control exception? > Ozone: KSM: Add checkBucketAccess > - > > Key: HDFS-12147 > URL: https://issues.apache.org/jira/browse/HDFS-12147 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Attachments: HDFS-12147-HDFS-7240.000.patch, > HDFS-12147-HDFS-7240.001.patch > > > Checks if the caller has access to a given bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12154) Incorrect javadoc description in StorageLocationChecker#check
[ https://issues.apache.org/jira/browse/HDFS-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12154: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) > Incorrect javadoc description in StorageLocationChecker#check > - > > Key: HDFS-12154 > URL: https://issues.apache.org/jira/browse/HDFS-12154 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Nandakumar >Assignee: Nandakumar >Priority: Trivial > Fix For: HDFS-7240 > > Attachments: HDFS-12154.000.patch > > > {{StorageLocationChecker#check}} returns list of healthy volumes, but javadoc > states that it returns failed volumes. > {code} > /** >* Initiate a check of the supplied storage volumes and return >* a list of failed volumes. >* >* StorageLocations are returned in the same order as the input >* for compatibility with existing unit tests. >* >* @param conf HDFS configuration. >* @param dataDirs list of volumes to check. >* @return returns a list of failed volumes. Returns the empty list if >* there are no failed volumes. >* >* @throws InterruptedException if the check was interrupted. >* @throws IOException if the number of failed volumes exceeds the >* maximum allowed or if there are no good >* volumes. >*/ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12147) Ozone: KSM: Add checkBucketAccess
[ https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091091#comment-16091091 ] Weiwei Yang edited comment on HDFS-12147 at 7/18/17 4:34 AM: - Hi [~nandakumar131], [~vagarychen] I am a bit confused with this patch. 1. Why the checkBucketAccess is exposed as a RPC call in KSM? Is it something that should be done internally in KSM while read/write/delete keys in a bucket? I am not sure why this is necessary to be exposed via {{KeySpaceManagerProtocol}}. 2. {{OzoneMetadataManager#checkBucketAccess}} loads the acls of a bucket from KSM db and compare that to the value passing by argument {{OzoneAcl}}, why we are comparing OzoneAcl ? I thought OzoneAcl was used to verify if a given user/group have a particular permission, e.g we could have OzoneAcl like following {{user:bilbo:rw}} which means user {{bilbo}} has read as well as write permission to the bucket. So it's pretty nature to check against user and group name. I don't understand the check in line 843 - 853, can you elaborate please ? Thank you. was (Author: cheersyang): Hi [~nandakumar131], [~vagarychen] I am a bit confused with this patch. 1. Why the checkBucketAccess is exposed as a RPC call in KSM? Is it something that should be done internally in KSM while read/write/delete keys in a bucket? I am not sure why this is necessary to be exposed via {{KeySpaceManagerProtocol}}. 2. {{OzoneMetadataManager#checkBucketAccess}} loads the acls of a bucket from KSM db and compare that to the value passing by argument {{OzoneAcl}}, why we are comparing OzoneAcl ? I thought OzoneAcl was used to verify if a given user/group have a particular permission, e.g we could have OzoneAcl like following user:bilbo:rw which means user {{bilbo}} has read as well as write permission to the bucket. So it's pretty nature to check against user and group name. I don't understand the check in line 843 - 853, can you elaborate please ? Thank you. > Ozone: KSM: Add checkBucketAccess > - > > Key: HDFS-12147 > URL: https://issues.apache.org/jira/browse/HDFS-12147 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Attachments: HDFS-12147-HDFS-7240.000.patch, > HDFS-12147-HDFS-7240.001.patch > > > Checks if the caller has access to a given bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12147) Ozone: KSM: Add checkBucketAccess
[ https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091091#comment-16091091 ] Weiwei Yang commented on HDFS-12147: Hi [~nandakumar131], [~vagarychen] I am a bit confused with this patch. 1. Why the checkBucketAccess is exposed as a RPC call in KSM? Is it something that should be done internally in KSM while read/write/delete keys in a bucket? I am not sure why this is necessary to be exposed via {{KeySpaceManagerProtocol}}. 2. {{OzoneMetadataManager#checkBucketAccess}} loads the acls of a bucket from KSM db and compare that to the value passing by argument {{OzoneAcl}}, why we are comparing OzoneAcl ? I thought OzoneAcl was used to verify if a given user/group have a particular permission, e.g we could have OzoneAcl like following user:bilbo:rw which means user {{bilbo}} has read as well as write permission to the bucket. So it's pretty nature to check against user and group name. I don't understand the check in line 843 - 853, can you elaborate please ? Thank you. > Ozone: KSM: Add checkBucketAccess > - > > Key: HDFS-12147 > URL: https://issues.apache.org/jira/browse/HDFS-12147 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Attachments: HDFS-12147-HDFS-7240.000.patch, > HDFS-12147-HDFS-7240.001.patch > > > Checks if the caller has access to a given bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12115) Ozone: SCM: Add queryNode RPC Call
[ https://issues.apache.org/jira/browse/HDFS-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091068#comment-16091068 ] Weiwei Yang commented on HDFS-12115: Hi [~anu] Thanks for the updates, I have comments to the v5 patch, *InProgressPool.java* NIT, line 203, extra space between NodeState and getNodeState *MockNodeManager.java* line 161-173, it seems this can be replaced by {{getNodes(nodestate).size()}}, but we need to make sure getNodes won't return us a null, maybe a empty list? *Ozone.proto* Add a placeholder for {{DECOMMISSIONING}} state? *SCMNodeManager.java* line 413-435, like you mentioned earlier, a node may have more than 1 state, e.g both HEALTHY and RAFT_MEMBER. But here getNodeState will only return single state, should this return an array of NodeState? line 491: instead of creating a new list, this can be done in java8 style {{return currentSet.stream().collect(Collectors.toList());}} Hope this helps, thanks > Ozone: SCM: Add queryNode RPC Call > -- > > Key: HDFS-12115 > URL: https://issues.apache.org/jira/browse/HDFS-12115 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-7240 > > Attachments: HDFS-12115-HDFS-7240.001.patch, > HDFS-12115-HDFS-7240.002.patch, HDFS-12115-HDFS-7240.003.patch, > HDFS-12115-HDFS-7240.004.patch, HDFS-12115-HDFS-7240.005.patch > > > Add queryNode RPC to Storage container location protocol. This allows > applications like SCM CLI to get the list of nodes in various states, like > Healthy, live or Dead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11996) Ozone : add an UT to test partial read of chunks
[ https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091079#comment-16091079 ] Weiwei Yang commented on HDFS-11996: Just committed to the feature branch, thanks [~vagarychen] for the contribution. Thanks for [~anu] for the review. > Ozone : add an UT to test partial read of chunks > > > Key: HDFS-11996 > URL: https://issues.apache.org/jira/browse/HDFS-11996 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test > Environment: Currently when reading a chunk, it is always the whole > chunk that gets returned. However it is possible the reader may only need to > read a subset of the chunk. This JIRA adds the partial read of chunks. >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Fix For: HDFS-7240 > > Attachments: HDFS-11996-HDFS-7240.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11996) Ozone : add an UT to test partial read of chunks
[ https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11996: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) > Ozone : add an UT to test partial read of chunks > > > Key: HDFS-11996 > URL: https://issues.apache.org/jira/browse/HDFS-11996 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test > Environment: Currently when reading a chunk, it is always the whole > chunk that gets returned. However it is possible the reader may only need to > read a subset of the chunk. This JIRA adds the partial read of chunks. >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Fix For: HDFS-7240 > > Attachments: HDFS-11996-HDFS-7240.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11996) Ozone : add partial read of chunks
[ https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091073#comment-16091073 ] Weiwei Yang commented on HDFS-11996: Non of the UT failures were related to this patch, I am going to test this patch again with latest code base, if everything goes fine, I will commit this shortly. Thanks for [~vagarychen] for adding this test and [~anu] for the review. > Ozone : add partial read of chunks > -- > > Key: HDFS-11996 > URL: https://issues.apache.org/jira/browse/HDFS-11996 > Project: Hadoop HDFS > Issue Type: Sub-task > Environment: Currently when reading a chunk, it is always the whole > chunk that gets returned. However it is possible the reader may only need to > read a subset of the chunk. This JIRA adds the partial read of chunks. >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11996-HDFS-7240.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091052#comment-16091052 ] Weiwei Yang commented on HDFS-12098: Oh [~anu], no problem at all. Thanks for your quick reply. > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase-1.patch, > HDFS-12098-HDFS-7240.testcase.patch, Screen Shot 2017-07-11 at 4.58.08 > PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11996) Ozone : add an UT to test partial read of chunks
[ https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11996: --- Target Version/s: HDFS-7240 Priority: Minor (was: Major) Component/s: test ozone > Ozone : add an UT to test partial read of chunks > > > Key: HDFS-11996 > URL: https://issues.apache.org/jira/browse/HDFS-11996 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test > Environment: Currently when reading a chunk, it is always the whole > chunk that gets returned. However it is possible the reader may only need to > read a subset of the chunk. This JIRA adds the partial read of chunks. >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Attachments: HDFS-11996-HDFS-7240.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11996) Ozone : add an UT to test partial read of chunks
[ https://issues.apache.org/jira/browse/HDFS-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11996: --- Summary: Ozone : add an UT to test partial read of chunks (was: Ozone : add partial read of chunks) > Ozone : add an UT to test partial read of chunks > > > Key: HDFS-11996 > URL: https://issues.apache.org/jira/browse/HDFS-11996 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test > Environment: Currently when reading a chunk, it is always the whole > chunk that gets returned. However it is possible the reader may only need to > read a subset of the chunk. This JIRA adds the partial read of chunks. >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11996-HDFS-7240.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090936#comment-16090936 ] Weiwei Yang commented on HDFS-12149: Thanks [~anu], sounds good to me. Since it seems no regression is introduced with this patch from the UT result, I am going to fix the checksytle issues and commit this today. After then, we can do more tests with rocks db. Thanks a lot for your quick response. > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12154) Incorrect javadoc description in StorageLocationChecker#check
[ https://issues.apache.org/jira/browse/HDFS-12154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090945#comment-16090945 ] Weiwei Yang commented on HDFS-12154: +1, pending on jenkins. Thanks [~nandakumar131] to fix this. > Incorrect javadoc description in StorageLocationChecker#check > - > > Key: HDFS-12154 > URL: https://issues.apache.org/jira/browse/HDFS-12154 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Nandakumar >Assignee: Nandakumar >Priority: Trivial > Attachments: HDFS-12154.000.patch > > > {{StorageLocationChecker#check}} returns list of healthy volumes, but javadoc > states that it returns failed volumes. > {code} > /** >* Initiate a check of the supplied storage volumes and return >* a list of failed volumes. >* >* StorageLocations are returned in the same order as the input >* for compatibility with existing unit tests. >* >* @param conf HDFS configuration. >* @param dataDirs list of volumes to check. >* @return returns a list of failed volumes. Returns the empty list if >* there are no failed volumes. >* >* @throws InterruptedException if the check was interrupted. >* @throws IOException if the number of failed volumes exceeds the >* maximum allowed or if there are no good >* volumes. >*/ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090926#comment-16090926 ] Weiwei Yang commented on HDFS-12149: Hi [~anu] There was a rocksdb build few hours ago 5.5.3 but it seems still uses old license, we will need to wait a few more days until the license updates. Do you want me to commit this first and open another JIRA to track the version update (so we can start to play with rocksdb), or you want me to hold off this patch until the new version comes out? Thank you. > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091004#comment-16091004 ] Weiwei Yang commented on HDFS-12098: Hi [~anu] Have you tried to reproduce this issue or apply the test case patch I uploaded to take a look at the issue ? Please let me know, thanks. > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase-1.patch, > HDFS-12098-HDFS-7240.testcase.patch, Screen Shot 2017-07-11 at 4.58.08 > PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands
[ https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091383#comment-16091383 ] Weiwei Yang edited comment on HDFS-12126 at 7/18/17 10:25 AM: -- Thanks [~linyiqun] for working on this, this is very much needed. I just read your v2 patch, I think overall it looks good, some comments *OzoneUtils* line 149: please add IllegalArgumentException in the method signature. *TestOzoneShell* # testCreateBucket: Can we add a test to create a bucket in a non-exist volume? # line 357: seems bucketInfo can be safely removed # line 359: this could mis-behave if vol.getBucket doesn't throw any exception, it will not reach the assert statement in the catch clause # line 566: it might be over concerned, but there might be slight chance two calls returns same volume name, can we use UUID or add a prefix as argument for {{creatVolume}} to completely avoid that? Thanks was (Author: cheersyang): Thanks [~linyiqun] for working on this, this is very much needed. I just read your v2 patch, I think overall it looks good, some comments for TestOzoneShell # testCreateBucket: Can we add a test to create a bucket in a non-exist volume? # line 357: seems bucketInfo can be safely removed # line 359: this could mis-behave if vol.getBucket doesn't throw any exception, it will not reach the assert statement in the catch clause # line 566: it might be over concerned, but there might be slight chance two calls returns same volume name, can we use UUID or add a prefix as argument for {{creatVolume}} to completely avoid that? Thanks > Ozone: Ozone shell: Add more testing for bucket shell commands > -- > > Key: HDFS-12126 > URL: https://issues.apache.org/jira/browse/HDFS-12126 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12126-HDFS-7240.001.patch, > HDFS-12126-HDFS-7240.002.patch > > > Adding more unit tests for ozone bucket commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12147) Ozone: KSM: Add checkBucketAccess
[ https://issues.apache.org/jira/browse/HDFS-12147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091358#comment-16091358 ] Weiwei Yang commented on HDFS-12147: Hi [~nandakumar131] Please hold on submitting a new patch, lets route this discussion to [~anu] as he reviewed HDFS-11771 for checkVolumeAccess. Can we revisit this 2 APIs and get them consisted? Ping [~anu], please take a look and let us know your thought, thanks. My thought if we are going to support ACLs, then we need to have an overall picture what places will need these checks and make sure they are all addressed. Otherwise it will be like some place working, some place not. Thank you. > Ozone: KSM: Add checkBucketAccess > - > > Key: HDFS-12147 > URL: https://issues.apache.org/jira/browse/HDFS-12147 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Attachments: HDFS-12147-HDFS-7240.000.patch, > HDFS-12147-HDFS-7240.001.patch > > > Checks if the caller has access to a given bucket. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands
[ https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091383#comment-16091383 ] Weiwei Yang commented on HDFS-12126: Thanks [~linyiqun] for working on this, this is very much needed. I just read your v2 patch, I think overall it looks good, some comments for TestOzoneShell # testCreateBucket: Can we add a test to create a bucket in a non-exist volume? # line 357: seems bucketInfo can be safely removed # line 359: this could mis-behave if vol.getBucket doesn't throw any exception, it will not reach the assert statement in the catch clause # line 566: it might be over concerned, but there might be slight chance two calls returns same volume name, can we use UUID or add a prefix as argument for {{creatVolume}} to completely avoid that? Thanks > Ozone: Ozone shell: Add more testing for bucket shell commands > -- > > Key: HDFS-12126 > URL: https://issues.apache.org/jira/browse/HDFS-12126 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12126-HDFS-7240.001.patch, > HDFS-12126-HDFS-7240.002.patch > > > Adding more unit tests for ozone bucket commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
Weiwei Yang created HDFS-12098: -- Summary: Ozone: Datanode is unable to register with scm if scm starts later Key: HDFS-12098 URL: https://issues.apache.org/jira/browse/HDFS-12098 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ozone, scm Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Reproducing steps # Start datanode # Wait and see datanode state, it has connection issues, this is expected # Start SCM, expecting datanode could connect to the scm and the state machine could transit to RUNNING. However in actual, its state transits to SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078213#comment-16078213 ] Weiwei Yang commented on HDFS-12098: This is because datanode state machine leaks {{VersionEndpointTask}} thread. In the case scm is not yet started, more and more {{VersionEndpointTask}} threads keep retrying connection with scm, {noformat} INIT - RUNNING \ GETVERSION executor.execute(new VersionEndpointTask()) - retry on getVersion ... ... (HB interval) executor.execute(new VersionEndpointTask()) - retry on getVersion ... ... (HB interval) executor.execute(new VersionEndpointTask()) - retry on getVersion ... ... {noformat} the version endpoint tasks are launched in HB interval (5s on my env), so every 5s there is a new task submitted; the retry policy for each getVersion call is 10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be ONE thread leak. When scm is up, all pending tasks will be able to connect to scm and getVersion call returns, so each of them will count the state to next, since the state is shared in {{EndpointStateMachine}}, it increments more than 1 so when I review the state changes, it looks like below {noformat} REGISTER HEARTBEAT SHUTDOWN SHUTDOWN SHUTDOWN ... {noformat} > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > > Reproducing steps > # Start datanode > # Wait and see datanode state, it has connection issues, this is expected > # Start SCM, expecting datanode could connect to the scm and the state > machine could transit to RUNNING. However in actual, its state transits to > SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12098: --- Attachment: thread_dump.log > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: thread_dump.log > > > Reproducing steps > # Start datanode > # Wait and see datanode state, it has connection issues, this is expected > # Start SCM, expecting datanode could connect to the scm and the state > machine could transit to RUNNING. However in actual, its state transits to > SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078213#comment-16078213 ] Weiwei Yang edited comment on HDFS-12098 at 7/7/17 3:11 PM: This is because datanode state machine leaks {{VersionEndpointTask}} thread. In the case scm is not yet started, more and more {{VersionEndpointTask}} threads keep retrying connection with scm, {noformat} INIT - RUNNING \ GETVERSION new VersionEndpointTask submitted - retrying ... ... (HB interval) new VersionEndpointTask submitted - retrying ... ... (HB interval) new VersionEndpointTask submitted - retrying ... ... {noformat} the version endpoint tasks are launched in HB interval (5s on my env), so every 5s there is a new task submitted; the retry policy for each getVersion call is 10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be ONE thread leak. When scm is up, all pending tasks will be able to connect to scm and getVersion call returns, so each of them will count the state to next, since the state is shared in {{EndpointStateMachine}}, it increments more than 1 so when I review the state changes, it looks like below {noformat} REGISTER HEARTBEAT SHUTDOWN SHUTDOWN SHUTDOWN ... {noformat} was (Author: cheersyang): This is because datanode state machine leaks {{VersionEndpointTask}} thread. In the case scm is not yet started, more and more {{VersionEndpointTask}} threads keep retrying connection with scm, {noformat} INIT - RUNNING \ GETVERSION executor.execute(new VersionEndpointTask()) - retry on getVersion ... ... (HB interval) executor.execute(new VersionEndpointTask()) - retry on getVersion ... ... (HB interval) executor.execute(new VersionEndpointTask()) - retry on getVersion ... ... {noformat} the version endpoint tasks are launched in HB interval (5s on my env), so every 5s there is a new task submitted; the retry policy for each getVersion call is 10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be ONE thread leak. When scm is up, all pending tasks will be able to connect to scm and getVersion call returns, so each of them will count the state to next, since the state is shared in {{EndpointStateMachine}}, it increments more than 1 so when I review the state changes, it looks like below {noformat} REGISTER HEARTBEAT SHUTDOWN SHUTDOWN SHUTDOWN ... {noformat} > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: thread_dump.log > > > Reproducing steps > # Start datanode > # Wait and see datanode state, it has connection issues, this is expected > # Start SCM, expecting datanode could connect to the scm and the state > machine could transit to RUNNING. However in actual, its state transits to > SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078213#comment-16078213 ] Weiwei Yang edited comment on HDFS-12098 at 7/7/17 3:15 PM: This is because datanode state machine leaks {{VersionEndpointTask}} thread. In the case scm is not yet started, more and more {{VersionEndpointTask}} threads keep retrying connection with scm, {noformat} INIT - RUNNING \ GETVERSION new VersionEndpointTask submitted - retrying ... ... (HB interval) new VersionEndpointTask submitted - retrying ... ... (HB interval) new VersionEndpointTask submitted - retrying ... ... {noformat} the version endpoint tasks are launched in HB interval (5s on my env), so every 5s there is a new task submitted; the retry policy for each getVersion call is 10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be ONE thread leak. Please see [^thread_dump.log], there are 20 VersionEndpointTask threads in WAITING state. And this number keeps increasing. When scm is up, all pending tasks will be able to connect to scm and getVersion call returns, so each of them will count the state to next, since the state is shared in {{EndpointStateMachine}}, it increments more than 1 so when I review the state changes, it looks like below {noformat} REGISTER HEARTBEAT SHUTDOWN SHUTDOWN SHUTDOWN ... {noformat} To fix this, instead of using a central ExecutorService carried in {{DatanodeStateMachine}}, instead we could init a fixed size of thread pool to execute end point tasks, and make sure the thread pool gets shutdown before entering next state (at end of await). was (Author: cheersyang): This is because datanode state machine leaks {{VersionEndpointTask}} thread. In the case scm is not yet started, more and more {{VersionEndpointTask}} threads keep retrying connection with scm, {noformat} INIT - RUNNING \ GETVERSION new VersionEndpointTask submitted - retrying ... ... (HB interval) new VersionEndpointTask submitted - retrying ... ... (HB interval) new VersionEndpointTask submitted - retrying ... ... {noformat} the version endpoint tasks are launched in HB interval (5s on my env), so every 5s there is a new task submitted; the retry policy for each getVersion call is 10 * 1s = 10s, so every 10s a task can be finished. So every 10s there will be ONE thread leak. When scm is up, all pending tasks will be able to connect to scm and getVersion call returns, so each of them will count the state to next, since the state is shared in {{EndpointStateMachine}}, it increments more than 1 so when I review the state changes, it looks like below {noformat} REGISTER HEARTBEAT SHUTDOWN SHUTDOWN SHUTDOWN ... {noformat} > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: thread_dump.log > > > Reproducing steps > # Start datanode > # Wait and see datanode state, it has connection issues, this is expected > # Start SCM, expecting datanode could connect to the scm and the state > machine could transit to RUNNING. However in actual, its state transits to > SHUTDOWN, datanode enters chill mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089218#comment-16089218 ] Weiwei Yang commented on HDFS-12149: Thanks [~anu] for the message, I will work on this. bq. I am aware that what we have is a generic plugin layer which can use most key value stores, and RocksDB is just a specific instance of it and it is trivial for us to revert it, even if it is committed. That's correct. We will follow Legal team's decision like you mentioned. It is trivial to revert this with a simple switch. Thank you. > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12098: --- Attachment: HDFS-12098-HDFS-7240.testcase.patch > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283 ] Weiwei Yang commented on HDFS-12098: Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm if you apply this patch, it's gonna fail. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283 ] Weiwei Yang edited comment on HDFS-12098 at 7/17/17 4:01 AM: - Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm if you apply this patch, it's gonna to fail. Some log from step 4 is interesting, {noformat} 2017-07-17 11:46:02,451 [Datanode State Machine Thread - 0] INFO ipc.Client (Client.java:handleConnectionFailure(933)) - Retrying connect to server: localhost/127.0.0.1:51183. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-07-17 11:46:02,467 [Datanode State Machine Thread - 0] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state REGISTER 2017-07-17 11:46:02,468 [Datanode State Machine Thread - 1] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state HEARTBEAT 2017-07-17 11:46:02,469 [Datanode State Machine Thread - 2] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN 2017-07-17 11:46:02,471 [Datanode State Machine Thread - 3] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN {noformat} Instead of transiting to state {{HEARTBEAT}}, it transited to {{SHUTDOWN}}. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks was (Author: cheersyang): Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm Step 4 will print log {noformat} 2017-07-17 11:46:02,451 [Datanode State Machine Thread - 0] INFO ipc.Client (Client.java:handleConnectionFailure(933)) - Retrying connect to server: localhost/127.0.0.1:51183. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-07-17 11:46:02,467 [Datanode State Machine Thread - 0] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state REGISTER 2017-07-17 11:46:02,468 [Datanode State Machine Thread - 1] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state HEARTBEAT 2017-07-17 11:46:02,469 [Datanode State Machine Thread - 2] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN 2017-07-17 11:46:02,471 [Datanode State Machine Thread - 3] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN 2017-07-17 11:46:03,457 [Datanode State Machine Thread - 0] INFO statemachine.DatanodeStateMachine (DatanodeStateMachine.java:lambda$startDaemon$0(272)) - Ozone container server started. {noformat} if you apply this patch, it's gonna to fail. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >
[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283 ] Weiwei Yang edited comment on HDFS-12098 at 7/17/17 3:59 AM: - Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm Step 4 will print log {noformat} 2017-07-17 11:46:02,451 [Datanode State Machine Thread - 0] INFO ipc.Client (Client.java:handleConnectionFailure(933)) - Retrying connect to server: localhost/127.0.0.1:51183. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2017-07-17 11:46:02,467 [Datanode State Machine Thread - 0] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state REGISTER 2017-07-17 11:46:02,468 [Datanode State Machine Thread - 1] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state HEARTBEAT 2017-07-17 11:46:02,469 [Datanode State Machine Thread - 2] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN 2017-07-17 11:46:02,471 [Datanode State Machine Thread - 3] INFO endpoint.VersionEndpointTask (VersionEndpointTask.java:call(61)) - Version endpoint task (localhost/127.0.0.1:51183) transited to state SHUTDOWN 2017-07-17 11:46:03,457 [Datanode State Machine Thread - 0] INFO statemachine.DatanodeStateMachine (DatanodeStateMachine.java:lambda$startDaemon$0(272)) - Ozone container server started. {noformat} if you apply this patch, it's gonna to fail. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks was (Author: cheersyang): Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm if you apply this patch, it's gonna to fail. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry >
[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12098: --- Attachment: (was: HDFS-12098-HDFS-7240.testcase.patch) > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png, > thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089283#comment-16089283 ] Weiwei Yang edited comment on HDFS-12098 at 7/17/17 3:58 AM: - Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm if you apply this patch, it's gonna to fail. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks was (Author: cheersyang): Attached a test case patch to reproduce this issue. Please take a look at [^HDFS-12098-HDFS-7240.testcase.patch]. This patch simulates the scenario # Start mini ozone cluster without starting scm # Datanode is unable to register to scm # Start scm, waiting for datanode to register # Wait a while but datanode is still unable to successfully register to scm if you apply this patch, it's gonna fail. You might have noticed the patch changes some more code than just adding a test, that is because the reason I mentioned earlier. I also have added a method to check if a datanode is registered to scm so that we can check datanode state even scm is not started. I have a patch to fix this also, if applied that patch, this test will pass. I am ready to share that as well. Thanks > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} >
[jira] [Updated] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
[ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12098: --- Status: Patch Available (was: In Progress) > Ozone: Datanode is unable to register with scm if scm starts later > -- > > Key: HDFS-12098 > URL: https://issues.apache.org/jira/browse/HDFS-12098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ozone, scm >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, > HDFS-12098-HDFS-7240.002.patch, HDFS-12098-HDFS-7240.testcase.patch, Screen > Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log > > > Reproducing steps > 1. Start namenode > {{./bin/hdfs --daemon start namenode}} > 2. Start datanode > {{./bin/hdfs datanode}} > will see following connection issues > {noformat} > 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 0 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 1 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 2 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: > ozone1.fyre.ibm.com/172.16.165.133:9861. Already tried 3 time(s); retry > policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 > SECONDS) > {noformat} > this is expected because scm is not started yet > 3. Start scm > {{./bin/hdfs scm}} > expecting datanode can register to this scm, expecting the log in scm > {noformat} > 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: > af22862d-aafa-4941-9073-53224ae43e2c Registered. > {noformat} > but did *NOT* see this log. (_I debugged into the code and found the datanode > state was transited SHUTDOWN unexpectedly because the thread leaks, each of > those threads counted to set to next state and they all set to SHUTDOWN > state_) > 4. Create a container from scm CLI > {{./bin/hdfs scm -container -create -c 20170714c0}} > this fails with following exception > {noformat} > Creating container : 20170714c0. > Error executing > command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException): > Unable to create container while in chill mode > at > org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241) > at > org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73) > {noformat} > datanode was not registered to scm, thus it's still in chill mode. > *Note*, if we start scm first, there is no such issue, I can create container > from CLI without any problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12069) Ozone: Create a general abstraction for metadata store
[ https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12069: --- Attachment: HDFS-12069-HDFS-7240.013.patch > Ozone: Create a general abstraction for metadata store > -- > > Key: HDFS-12069 > URL: https://issues.apache.org/jira/browse/HDFS-12069 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12069-HDFS-7240.001.patch, > HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, > HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, > HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, > HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, > HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, > HDFS-12069-HDFS-7240.012.patch, HDFS-12069-HDFS-7240.013.patch > > > Create a general abstraction for metadata store so that we can plug other key > value store to host ozone metadata. Currently only levelDB is implemented, we > want to support RocksDB as it provides more production ready features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12145) Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names
[ https://issues.apache.org/jira/browse/HDFS-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088651#comment-16088651 ] Weiwei Yang commented on HDFS-12145: Hi [~msingh] Thanks for working on this, the patch overall looks good. Few comments 1. Can you dump the container.db and make sure its database key-value are expected? I am expecting keys are still raw-key-names, and values {{KeyData}} should contain a list of correct chunk names. This requires no code change, just want to make sure db info is accurate. 2. In {{TestKeys}}, can you add a key name argument in {{PutHelper}}'s constructor? So that we can parameterize this class to run with different key names. E.g {code} new PutHelper(ozoneRestClient, path, "a"); new PutHelper(ozoneRestClient, path, "a/b/c"); new PutHelper(ozoneRestClient, path, "a//b"); {code} this can be reused in feature if we need to test more format of key names. 3. In {{TestKeys}}, line 168, is it better to create a random file with {{newKeyName}} instead of {{keyNamePart1}} ? Thank you. > Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names > -- > > Key: HDFS-12145 > URL: https://issues.apache.org/jira/browse/HDFS-12145 > Project: Hadoop HDFS > Issue Type: Bug > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Fix For: HDFS-7240 > > Attachments: HDFS-12145-HDFS-7240.001.patch, > HDFS-12145-HDFS-7240.002.patch > > > With OzoneFileSystem, key names will be delimited by "/" which is used as the > path separator. > Support should be added in KSM and Ozone to support key name with "/" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties
[ https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12148: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) > Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has > some missing properties > > > Key: HDFS-12148 > URL: https://issues.apache.org/jira/browse/HDFS-12148 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Fix For: HDFS-7240 > > Attachments: HDFS-12148-HDFS-7240.001.patch > > > Following properties added by HDFS-11493 is missing in ozone-default.xml > {noformat} > ozone.scm.max.container.report.threads > ozone.scm.container.report.processing.interval.seconds > ozone.scm.container.reports.wait.timeout.seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094349#comment-16094349 ] Weiwei Yang commented on HDFS-12149: Hi [~yuanbo] Thanks for helping to review this. I have fixed the close code as you suggesgted. bq. line 267: I don't have much experience in RocksDB, what if iterator doesn't have next or prev? In rocksDB, we can use next() plus isValid() combination instead, slightly different with leveldb. Thank you. > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch, > HDFS-12149-HDFS-7240.002.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12149: --- Attachment: HDFS-12149-HDFS-7240.003.patch > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch, > HDFS-12149-HDFS-7240.002.patch, HDFS-12149-HDFS-7240.003.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey
[ https://issues.apache.org/jira/browse/HDFS-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094728#comment-16094728 ] Weiwei Yang commented on HDFS-12167: See error message {noformat} Error Message Unable to find the container. Name: c0@]=[3/~C"8 Stacktrace org.apache.hadoop.scm.container.common.helpers.StorageContainerException: Unable to find the container. Name: c0@]=[3/~C"8 at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainer(ContainerManagerImpl.java:486) at org.apache.hadoop.ozone.container.common.impl.ChunkManagerImpl.writeChunk(ChunkManagerImpl.java:80) at org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.writeChunkHelper(TestContainerPersistence.java:373) at org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.writeKeyHelper(TestContainerPersistence.java:809) at org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.testListKey(TestContainerPersistence.java:825) Standard Output 2017-07-20 10:02:34,116 [Thread-13] INFO impl.ContainerManagerImpl (ContainerManagerImpl.java:init(149)) - Loading containers under [DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/TestContainerPersistence/tmp/ozone/repository 2017-07-20 10:02:34,122 [Thread-13] WARN fs.CachingGetSpaceUsed (DU.java:refresh(55)) - Could not get disk usage information for path /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/4/dfs/data java.io.IOException: Expecting a line not the end of stream at org.apache.hadoop.fs.DU$DUShell.parseExecResult(DU.java:79) at org.apache.hadoop.util.Shell.runCommand(Shell.java:980) at org.apache.hadoop.util.Shell.run(Shell.java:887) at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:62) at org.apache.hadoop.fs.DU.refresh(DU.java:53) at org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:87) at org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166) at org.apache.hadoop.ozone.container.common.impl.ContainerStorageLocation.(ContainerStorageLocation.java:73) at org.apache.hadoop.ozone.container.common.impl.ContainerLocationManagerImpl.(ContainerLocationManagerImpl.java:67) at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:168) at org.apache.hadoop.ozone.container.common.impl.TestContainerPersistence.setupPaths(TestContainerPersistence.java:146) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) 2017-07-20 10:02:34,132 [Thread-13] INFO impl.TestContainerPersistence (TestContainerPersistence.java:cleanupDir(152)) - Deletting /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/TestContainerPersistence/tmp/ozone {noformat} [https://builds.apache.org/job/PreCommit-HDFS-Build/20346/testReport/org.apache.hadoop.ozone.container.common.impl/TestContainerPersistence/testListKey/] > Ozone: Intermittent failure TestContainerPersistence#testListKey > > > Key: HDFS-12167 > URL: https://issues.apache.org/jira/browse/HDFS-12167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test >Reporter: Weiwei Yang >Priority: Minor > > TestContainerPersistence#listKeys seems to fail intermittently. It looks > like it was failing because some unexpected format of container name. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey
Weiwei Yang created HDFS-12167: -- Summary: Ozone: Intermittent failure TestContainerPersistence#testListKey Key: HDFS-12167 URL: https://issues.apache.org/jira/browse/HDFS-12167 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone, test Reporter: Weiwei Yang Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey
[ https://issues.apache.org/jira/browse/HDFS-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12167: --- Release Note: (was: TestContainerPersistence#listKeys seems to fail intermittently. It looks like it was failing because some unexpected format of container name.) > Ozone: Intermittent failure TestContainerPersistence#testListKey > > > Key: HDFS-12167 > URL: https://issues.apache.org/jira/browse/HDFS-12167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test >Reporter: Weiwei Yang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12167) Ozone: Intermittent failure TestContainerPersistence#testListKey
[ https://issues.apache.org/jira/browse/HDFS-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12167: --- Description: TestContainerPersistence#listKeys seems to fail intermittently. It looks like it was failing because some unexpected format of container name. > Ozone: Intermittent failure TestContainerPersistence#testListKey > > > Key: HDFS-12167 > URL: https://issues.apache.org/jira/browse/HDFS-12167 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test >Reporter: Weiwei Yang >Priority: Minor > > TestContainerPersistence#listKeys seems to fail intermittently. It looks > like it was failing because some unexpected format of container name. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094739#comment-16094739 ] Weiwei Yang commented on HDFS-12149: The UT failure seems not related, I have opened HDFS-12167 to track that issue. > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch, > HDFS-12149-HDFS-7240.002.patch, HDFS-12149-HDFS-7240.003.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands
[ https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092633#comment-16092633 ] Weiwei Yang commented on HDFS-12126: +1, I will commit this soon. Thanks [~linyiqun]! > Ozone: Ozone shell: Add more testing for bucket shell commands > -- > > Key: HDFS-12126 > URL: https://issues.apache.org/jira/browse/HDFS-12126 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12126-HDFS-7240.001.patch, > HDFS-12126-HDFS-7240.002.patch, HDFS-12126-HDFS-7240.003.patch > > > Adding more unit tests for ozone bucket commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands
[ https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12126: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) > Ozone: Ozone shell: Add more testing for bucket shell commands > -- > > Key: HDFS-12126 > URL: https://issues.apache.org/jira/browse/HDFS-12126 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: HDFS-7240 > > Attachments: HDFS-12126-HDFS-7240.001.patch, > HDFS-12126-HDFS-7240.002.patch, HDFS-12126-HDFS-7240.003.patch > > > Adding more unit tests for ozone bucket commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12126) Ozone: Ozone shell: Add more testing for bucket shell commands
[ https://issues.apache.org/jira/browse/HDFS-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092639#comment-16092639 ] Weiwei Yang commented on HDFS-12126: Just committed to the feature branch, thanks [~linyiqun] for the contribution. > Ozone: Ozone shell: Add more testing for bucket shell commands > -- > > Key: HDFS-12126 > URL: https://issues.apache.org/jira/browse/HDFS-12126 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: HDFS-7240 > > Attachments: HDFS-12126-HDFS-7240.001.patch, > HDFS-12126-HDFS-7240.002.patch, HDFS-12126-HDFS-7240.003.patch > > > Adding more unit tests for ozone bucket commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11984) Ozone: Ensures listKey lists all required key fields
[ https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned HDFS-11984: -- Assignee: Weiwei Yang > Ozone: Ensures listKey lists all required key fields > > > Key: HDFS-11984 > URL: https://issues.apache.org/jira/browse/HDFS-11984 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > HDFS-11782 implements the listKey operation which only lists the basic key > fields, we need to make sure it return all required fields > # version > # md5hash > # createdOn > # size > # keyName > # dataFileName > this task is depending on the work of HDFS-11886. See more discussion [here | > https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands
[ https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094774#comment-16094774 ] Weiwei Yang commented on HDFS-12127: Hi [~linyiqun] Thanks a lot to add the test case and fix those bugs! The patch looks really good.This helps a lot. Just one minor comment to v2 patch, might be a bit picky :p *KeyManagerImpl* Line 112 to 116: can we change it to {code} try { ... } catch (KSMException e) { throw e; } catch (IOException e) { ... } {code} Thanks a lot! > Ozone: Ozone shell: Add more testing for key shell commands > --- > > Key: HDFS-12127 > URL: https://issues.apache.org/jira/browse/HDFS-12127 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12127-HDFS-7240.001.patch, > HDFS-12127-HDFS-7240.002.patch > > > Adding more unit tests for ozone key commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks
[ https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned HDFS-11922: -- Assignee: Weiwei Yang > Ozone: KSM: Garbage collect deleted blocks > -- > > Key: HDFS-11922 > URL: https://issues.apache.org/jira/browse/HDFS-11922 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Weiwei Yang >Priority: Critical > > We need to garbage collect deleted blocks from the Datanodes. There are two > cases where we will have orphaned blocks. One is like the classical HDFS, > where someone deletes a key and we need to delete the corresponding blocks. > Another case, is when someone overwrites a key -- an overwrite can be treated > as a delete and a new put -- that means that older blocks need to be GC-ed at > some point of time. > Couple of JIRAs has discussed this in one form or another -- so consolidating > all those discussions in this JIRA. > HDFS-11796 -- needs to fix this issue for some tests to pass > HDFS-11780 -- changed the old overwriting behavior to not supporting this > feature for time being. > HDFS-11920 - Once again runs into this issue when user tries to put an > existing key. > HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on > GC for Datanodes. > When we solve this issue, we should also consider 2 more aspects. > One, we support versioning in the buckets, tracking which blocks are really > orphaned is something that KSM will do. So delete and overwrite at some point > needs to decide how to handle versioning of buckets. > Two, If a key exists in a closed container, then it is immutable, hence the > strategy of removing the key might be more complex than just talking to an > open container. > cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], > [~szetszwo], [~nandakumar131] > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
[ https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned HDFS-12176: -- Assignee: Weiwei Yang > dfsadmin shows DFS Used%: NaN% if the cluster has zero block. > - > > Key: HDFS-12176 > URL: https://issues.apache.org/jira/browse/HDFS-12176 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Weiwei Yang >Priority: Trivial > > This is rather a non-issue, but thought I should file it anyway. > I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin > shows: > {noformat} > $ hdfs dfsadmin -report > Configured Capacity: 0 (0 B) > Present Capacity: 0 (0 B) > DFS Remaining: 0 (0 B) > DFS Used: 0 (0 B) > DFS Used%: NaN% > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
[ https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095698#comment-16095698 ] Weiwei Yang commented on HDFS-12176: Hi [~jojochuang] I did a quick check. NaN was from {{0/(double)0}}, if {{presentCapacity}} is 0, it should directly return 0, instead of letting it divide by zero. Let me submit a simple patch for this. > dfsadmin shows DFS Used%: NaN% if the cluster has zero block. > - > > Key: HDFS-12176 > URL: https://issues.apache.org/jira/browse/HDFS-12176 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Priority: Trivial > > This is rather a non-issue, but thought I should file it anyway. > I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin > shows: > {noformat} > $ hdfs dfsadmin -report > Configured Capacity: 0 (0 B) > Present Capacity: 0 (0 B) > DFS Remaining: 0 (0 B) > DFS Used: 0 (0 B) > DFS Used%: NaN% > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
[ https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12176: --- Attachment: HDFS-12176.001.patch > dfsadmin shows DFS Used%: NaN% if the cluster has zero block. > - > > Key: HDFS-12176 > URL: https://issues.apache.org/jira/browse/HDFS-12176 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Weiwei Yang >Priority: Trivial > Attachments: HDFS-12176.001.patch > > > This is rather a non-issue, but thought I should file it anyway. > I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin > shows: > {noformat} > $ hdfs dfsadmin -report > Configured Capacity: 0 (0 B) > Present Capacity: 0 (0 B) > DFS Remaining: 0 (0 B) > DFS Used: 0 (0 B) > DFS Used%: NaN% > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12176) dfsadmin shows DFS Used%: NaN% if the cluster has zero block.
[ https://issues.apache.org/jira/browse/HDFS-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12176: --- Status: Patch Available (was: Open) > dfsadmin shows DFS Used%: NaN% if the cluster has zero block. > - > > Key: HDFS-12176 > URL: https://issues.apache.org/jira/browse/HDFS-12176 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Weiwei Yang >Priority: Trivial > Attachments: HDFS-12176.001.patch > > > This is rather a non-issue, but thought I should file it anyway. > I have a test cluster with just NN fsimage, no DN, no blocks, and dfsadmin > shows: > {noformat} > $ hdfs dfsadmin -report > Configured Capacity: 0 (0 B) > Present Capacity: 0 (0 B) > DFS Remaining: 0 (0 B) > DFS Used: 0 (0 B) > DFS Used%: NaN% > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
[ https://issues.apache.org/jira/browse/HDFS-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095702#comment-16095702 ] Weiwei Yang commented on HDFS-12149: Thanks [~anu] ! > Ozone: RocksDB implementation of ozone metadata store > - > > Key: HDFS-12149 > URL: https://issues.apache.org/jira/browse/HDFS-12149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12149-HDFS-7240.001.patch, > HDFS-12149-HDFS-7240.002.patch, HDFS-12149-HDFS-7240.003.patch > > > HDFS-12069 added a general interface for ozone metadata store, we already > have a leveldb implementation, this JIRA is to track the work of rocksdb > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12071) Ozone: Corona: Implementation of Corona
[ https://issues.apache.org/jira/browse/HDFS-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095703#comment-16095703 ] Weiwei Yang commented on HDFS-12071: Nice work guys, any document how to run corona? I would like to try this. Thanks. > Ozone: Corona: Implementation of Corona > --- > > Key: HDFS-12071 > URL: https://issues.apache.org/jira/browse/HDFS-12071 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Attachments: HDFS-12071-HDFS-7240.000.patch, > HDFS-12071-HDFS-7240.001.patch, HDFS-12071-HDFS-7240.002.patch > > > Tool to populate ozone with data for testing. > This is not a map-reduce program and this is not for benchmarking Ozone write > throughput. > It supports both online and offline modes. Default mode is offline, {{-mode}} > can be used to change the mode. > > In online mode, active internet connection is required, common crawl data > from AWS will be used. Default source is [CC-MAIN-2017-17/warc.paths.gz | > https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz] > (it contains the path to actual data segment), user can override this using > {{-source}}. > The following values are derived from URL of Common Crawl data > * Domain will be used as Volume > * URL will be used as Bucket > * FileName will be used as Key > > In offline mode, the data will be random bytes and size of data will be 10 KB. > * Default number of Volumes 10, {{-numOfVolumes}} can be used to override > * Default number of Buckets per Volume 1000, {{-numOfBuckets}} can be used to > override > * Default number of Keys per Bucket 50, {{-numOfKeys}} can be used to > override -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12149) Ozone: RocksDB implementation of ozone metadata store
Weiwei Yang created HDFS-12149: -- Summary: Ozone: RocksDB implementation of ozone metadata store Key: HDFS-12149 URL: https://issues.apache.org/jira/browse/HDFS-12149 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang HDFS-12069 added a general interface for ozone metadata store, we already have a leveldb implementation, this JIRA is to track the work of rocksdb implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties
[ https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088774#comment-16088774 ] Weiwei Yang commented on HDFS-12148: +[~anu] I have added missing properties to ozone-default.xml. Please kindly review the description if they are accurate, feel free to modify the description. Thanks! > Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has > some missing properties > > > Key: HDFS-12148 > URL: https://issues.apache.org/jira/browse/HDFS-12148 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12148-HDFS-7240.001.patch > > > Following properties added by HDFS-11493 is missing in ozone-default.xml > {noformat} > ozone.scm.max.container.report.threads > ozone.scm.container.report.processing.interval.seconds > ozone.scm.container.reports.wait.timeout.seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties
[ https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12148: --- Status: Patch Available (was: Open) > Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has > some missing properties > > > Key: HDFS-12148 > URL: https://issues.apache.org/jira/browse/HDFS-12148 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12148-HDFS-7240.001.patch > > > Following properties added by HDFS-11493 is missing in ozone-default.xml > {noformat} > ozone.scm.max.container.report.threads > ozone.scm.container.report.processing.interval.seconds > ozone.scm.container.reports.wait.timeout.seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties
[ https://issues.apache.org/jira/browse/HDFS-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12148: --- Attachment: HDFS-12148-HDFS-7240.001.patch > Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has > some missing properties > > > Key: HDFS-12148 > URL: https://issues.apache.org/jira/browse/HDFS-12148 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Minor > Attachments: HDFS-12148-HDFS-7240.001.patch > > > Following properties added by HDFS-11493 is missing in ozone-default.xml > {noformat} > ozone.scm.max.container.report.threads > ozone.scm.container.report.processing.interval.seconds > ozone.scm.container.reports.wait.timeout.seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12069) Ozone: Create a general abstraction for metadata store
[ https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12069: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) I have committed this to the feature branch, thanks [~anu], [~xyao], [~yuanbo] and [~msingh] for the reviews. Thanks a lot. > Ozone: Create a general abstraction for metadata store > -- > > Key: HDFS-12069 > URL: https://issues.apache.org/jira/browse/HDFS-12069 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Fix For: HDFS-7240 > > Attachments: HDFS-12069-HDFS-7240.001.patch, > HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, > HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, > HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, > HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, > HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, > HDFS-12069-HDFS-7240.012.patch, HDFS-12069-HDFS-7240.013.patch > > > Create a general abstraction for metadata store so that we can plug other key > value store to host ozone metadata. Currently only levelDB is implemented, we > want to support RocksDB as it provides more production ready features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12069) Ozone: Create a general abstraction for metadata store
[ https://issues.apache.org/jira/browse/HDFS-12069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088767#comment-16088767 ] Weiwei Yang commented on HDFS-12069: UT failures are not related. I have test locally {{TestDatanodeStateMachine}} seems work. {{TestOzoneConfigurationFields}} is caused by HDFS-11493, will create a JIRA to get that fixed. {{TestContainerReplicationManager}} is failing with or without this patch. I am going to commit this soon. > Ozone: Create a general abstraction for metadata store > -- > > Key: HDFS-12069 > URL: https://issues.apache.org/jira/browse/HDFS-12069 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-12069-HDFS-7240.001.patch, > HDFS-12069-HDFS-7240.002.patch, HDFS-12069-HDFS-7240.003.patch, > HDFS-12069-HDFS-7240.004.patch, HDFS-12069-HDFS-7240.005.patch, > HDFS-12069-HDFS-7240.006.patch, HDFS-12069-HDFS-7240.007.patch, > HDFS-12069-HDFS-7240.008.patch, HDFS-12069-HDFS-7240.009.patch, > HDFS-12069-HDFS-7240.010.patch, HDFS-12069-HDFS-7240.011.patch, > HDFS-12069-HDFS-7240.012.patch, HDFS-12069-HDFS-7240.013.patch > > > Create a general abstraction for metadata store so that we can plug other key > value store to host ozone metadata. Currently only levelDB is implemented, we > want to support RocksDB as it provides more production ready features. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12148) Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties
Weiwei Yang created HDFS-12148: -- Summary: Ozone: TestOzoneConfigurationFields is failing because ozone-default.xml has some missing properties Key: HDFS-12148 URL: https://issues.apache.org/jira/browse/HDFS-12148 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Minor Following properties added by HDFS-11493 is missing in ozone-default.xml {noformat} ozone.scm.max.container.report.threads ozone.scm.container.report.processing.interval.seconds ozone.scm.container.reports.wait.timeout.seconds {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12071) Ozone: Corona: Implementation of Corona
[ https://issues.apache.org/jira/browse/HDFS-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12071: --- Fix Version/s: HDFS-7240 > Ozone: Corona: Implementation of Corona > --- > > Key: HDFS-12071 > URL: https://issues.apache.org/jira/browse/HDFS-12071 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Fix For: HDFS-7240 > > Attachments: HDFS-12071-HDFS-7240.000.patch, > HDFS-12071-HDFS-7240.001.patch, HDFS-12071-HDFS-7240.002.patch > > > Tool to populate ozone with data for testing. > This is not a map-reduce program and this is not for benchmarking Ozone write > throughput. > It supports both online and offline modes. Default mode is offline, {{-mode}} > can be used to change the mode. > > In online mode, active internet connection is required, common crawl data > from AWS will be used. Default source is [CC-MAIN-2017-17/warc.paths.gz | > https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz] > (it contains the path to actual data segment), user can override this using > {{-source}}. > The following values are derived from URL of Common Crawl data > * Domain will be used as Volume > * URL will be used as Bucket > * FileName will be used as Key > > In offline mode, the data will be random bytes and size of data will be 10 KB. > * Default number of Volumes 10, {{-numOfVolumes}} can be used to override > * Default number of Buckets per Volume 1000, {{-numOfBuckets}} can be used to > override > * Default number of Keys per Bucket 50, {{-numOfKeys}} can be used to > override -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands
[ https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095959#comment-16095959 ] Weiwei Yang commented on HDFS-12127: Hmm, there seems a lot more UT are failing, I can't tell if they are caused by this patch (probably not) or the latest trunk merge (probably yes), but could you please confirm? If trunk merge causes those problems, we will need another JIRA to track. > Ozone: Ozone shell: Add more testing for key shell commands > --- > > Key: HDFS-12127 > URL: https://issues.apache.org/jira/browse/HDFS-12127 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12127-HDFS-7240.001.patch, > HDFS-12127-HDFS-7240.002.patch, HDFS-12127-HDFS-7240.003.patch > > > Adding more unit tests for ozone key commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11936) Ozone: TestNodeManager times out before it is able to find all nodes
[ https://issues.apache.org/jira/browse/HDFS-11936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11936: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) Just committed to the feature branch, thanks for the contribution [~yuanbo] > Ozone: TestNodeManager times out before it is able to find all nodes > > > Key: HDFS-11936 > URL: https://issues.apache.org/jira/browse/HDFS-11936 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Yuanbo Liu > Fix For: HDFS-7240 > > Attachments: HDFS-11936-HDFS-7240.001.patch, > HDFS-11936-HDFS-7240.002.patch > > > During the pre-commit build of > https://builds.apache.org/job/PreCommit-HDFS-Build/19795/testReport/ > we detected that a test in TestNodeManager is failing. Probably due to the > fact that we need more time to execute this test in jenkins. This might be > related to HDFS-11919 > The test failure report follows. > == > {noformat} > Regression > org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport > Failing for the past 1 build (Since Failed#19795 ) > Took 0.51 sec. > Error Message > expected:<2> but was:<18000> > Stacktrace > java.lang.AssertionError: expected:<2> but was:<18000> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport(TestNodeManager.java:972) > Standard Output > 2017-06-06 13:45:30,909 [main] INFO - Data node with ID: > 732ebd32-a926-44c5-afbb-c9f87513a67c Registered. > 2017-06-06 13:45:30,937 [main] INFO - Data node with ID: > 6860fd5d-94dc-4ba8-acd0-41cc3fa7232d Registered. > 2017-06-06 13:45:30,971 [main] INFO - Data node with ID: > cad7174c-204c-4806-b3af-c874706d4bd9 Registered. > 2017-06-06 13:45:30,996 [main] INFO - Data node with ID: > 0130a672-719d-4b68-9a1e-13046f4281ff Registered. > 2017-06-06 13:45:31,021 [main] INFO - Data node with ID: > 8d9ea5d4-6752-48d4-9bf0-adb0bd1a651a Registered. > 2017-06-06 13:45:31,046 [main] INFO - Data node with ID: > f122e372-5a38-476b-97dc-5ae449190485 Registered. > 2017-06-06 13:45:31,071 [main] INFO - Data node with ID: > 5750eb03-c1ac-4b3a-bc59-c4d9481e245b Registered. > 2017-06-06 13:45:31,097 [main] INFO - Data node with ID: > aa2d90a1-9e85-41f8-a4e5-35c7d2ed7299 Registered. > 2017-06-06 13:45:31,122 [main] INFO - Data node with ID: > 5e52bf5c-7050-4fc9-bf10-0e52650229ee Registered. > 2017-06-06 13:45:31,147 [main] INFO - Data node with ID: > eaac7b8f-a556-4afc-9163-7309f7ccea18 Registered. > 2017-06-06 13:45:31,224 [SCM Heartbeat Processing Thread - 0] INFO - > Current Thread is interrupted, shutting down HB processing thread for Node > Manager. > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12115) Ozone: SCM: Add queryNode RPC Call
[ https://issues.apache.org/jira/browse/HDFS-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095915#comment-16095915 ] Weiwei Yang commented on HDFS-12115: Hi [~anu] UT failure {{testCapacityPlacementYieldsBetterDataDistribution}} seems related, can we get that fixed before committing this? And there seems to have a blank line at EOF in your v7 patch line 201, could you please remove that as well? Thanks > Ozone: SCM: Add queryNode RPC Call > -- > > Key: HDFS-12115 > URL: https://issues.apache.org/jira/browse/HDFS-12115 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-7240 > > Attachments: HDFS-12115-HDFS-7240.001.patch, > HDFS-12115-HDFS-7240.002.patch, HDFS-12115-HDFS-7240.003.patch, > HDFS-12115-HDFS-7240.004.patch, HDFS-12115-HDFS-7240.005.patch, > HDFS-12115-HDFS-7240.006.patch, HDFS-12115-HDFS-7240.007.patch > > > Add queryNode RPC to Storage container location protocol. This allows > applications like SCM CLI to get the list of nodes in various states, like > Healthy, live or Dead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12071) Ozone: Corona: Implementation of Corona
[ https://issues.apache.org/jira/browse/HDFS-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12071: --- Labels: tool (was: ) > Ozone: Corona: Implementation of Corona > --- > > Key: HDFS-12071 > URL: https://issues.apache.org/jira/browse/HDFS-12071 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nandakumar >Assignee: Nandakumar > Labels: tool > Fix For: HDFS-7240 > > Attachments: HDFS-12071-HDFS-7240.000.patch, > HDFS-12071-HDFS-7240.001.patch, HDFS-12071-HDFS-7240.002.patch > > > Tool to populate ozone with data for testing. > This is not a map-reduce program and this is not for benchmarking Ozone write > throughput. > It supports both online and offline modes. Default mode is offline, {{-mode}} > can be used to change the mode. > > In online mode, active internet connection is required, common crawl data > from AWS will be used. Default source is [CC-MAIN-2017-17/warc.paths.gz | > https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-17/warc.paths.gz] > (it contains the path to actual data segment), user can override this using > {{-source}}. > The following values are derived from URL of Common Crawl data > * Domain will be used as Volume > * URL will be used as Bucket > * FileName will be used as Key > > In offline mode, the data will be random bytes and size of data will be 10 KB. > * Default number of Volumes 10, {{-numOfVolumes}} can be used to override > * Default number of Buckets per Volume 1000, {{-numOfBuckets}} can be used to > override > * Default number of Keys per Bucket 50, {{-numOfKeys}} can be used to > override -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11936) Ozone: TestNodeManager times out before it is able to find all nodes
[ https://issues.apache.org/jira/browse/HDFS-11936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095908#comment-16095908 ] Weiwei Yang commented on HDFS-11936: Makes sense to me, 100ms HB interval here creates race condition, increase to 1s makes sense to me. Thanks [~yuanbo], I am going to commit this soon. > Ozone: TestNodeManager times out before it is able to find all nodes > > > Key: HDFS-11936 > URL: https://issues.apache.org/jira/browse/HDFS-11936 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Yuanbo Liu > Attachments: HDFS-11936-HDFS-7240.001.patch, > HDFS-11936-HDFS-7240.002.patch > > > During the pre-commit build of > https://builds.apache.org/job/PreCommit-HDFS-Build/19795/testReport/ > we detected that a test in TestNodeManager is failing. Probably due to the > fact that we need more time to execute this test in jenkins. This might be > related to HDFS-11919 > The test failure report follows. > == > {noformat} > Regression > org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport > Failing for the past 1 build (Since Failed#19795 ) > Took 0.51 sec. > Error Message > expected:<2> but was:<18000> > Stacktrace > java.lang.AssertionError: expected:<2> but was:<18000> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.scm.node.TestNodeManager.testScmStatsFromNodeReport(TestNodeManager.java:972) > Standard Output > 2017-06-06 13:45:30,909 [main] INFO - Data node with ID: > 732ebd32-a926-44c5-afbb-c9f87513a67c Registered. > 2017-06-06 13:45:30,937 [main] INFO - Data node with ID: > 6860fd5d-94dc-4ba8-acd0-41cc3fa7232d Registered. > 2017-06-06 13:45:30,971 [main] INFO - Data node with ID: > cad7174c-204c-4806-b3af-c874706d4bd9 Registered. > 2017-06-06 13:45:30,996 [main] INFO - Data node with ID: > 0130a672-719d-4b68-9a1e-13046f4281ff Registered. > 2017-06-06 13:45:31,021 [main] INFO - Data node with ID: > 8d9ea5d4-6752-48d4-9bf0-adb0bd1a651a Registered. > 2017-06-06 13:45:31,046 [main] INFO - Data node with ID: > f122e372-5a38-476b-97dc-5ae449190485 Registered. > 2017-06-06 13:45:31,071 [main] INFO - Data node with ID: > 5750eb03-c1ac-4b3a-bc59-c4d9481e245b Registered. > 2017-06-06 13:45:31,097 [main] INFO - Data node with ID: > aa2d90a1-9e85-41f8-a4e5-35c7d2ed7299 Registered. > 2017-06-06 13:45:31,122 [main] INFO - Data node with ID: > 5e52bf5c-7050-4fc9-bf10-0e52650229ee Registered. > 2017-06-06 13:45:31,147 [main] INFO - Data node with ID: > eaac7b8f-a556-4afc-9163-7309f7ccea18 Registered. > 2017-06-06 13:45:31,224 [SCM Heartbeat Processing Thread - 0] INFO - > Current Thread is interrupted, shutting down HB processing thread for Node > Manager. > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands
[ https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12127: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) > Ozone: Ozone shell: Add more testing for key shell commands > --- > > Key: HDFS-12127 > URL: https://issues.apache.org/jira/browse/HDFS-12127 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: HDFS-7240 > > Attachments: HDFS-12127-HDFS-7240.001.patch, > HDFS-12127-HDFS-7240.002.patch, HDFS-12127-HDFS-7240.003.patch > > > Adding more unit tests for ozone key commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12127) Ozone: Ozone shell: Add more testing for key shell commands
[ https://issues.apache.org/jira/browse/HDFS-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096190#comment-16096190 ] Weiwei Yang commented on HDFS-12127: Looks good, +1. I am going to commit this shortly. Thanks [~linyiqun] to confirm this! > Ozone: Ozone shell: Add more testing for key shell commands > --- > > Key: HDFS-12127 > URL: https://issues.apache.org/jira/browse/HDFS-12127 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, tools >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-12127-HDFS-7240.001.patch, > HDFS-12127-HDFS-7240.002.patch, HDFS-12127-HDFS-7240.003.patch > > > Adding more unit tests for ozone key commands, similar to HDFS-12118. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db
[ https://issues.apache.org/jira/browse/HDFS-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097045#comment-16097045 ] Weiwei Yang commented on HDFS-12187: TestKSMSQLCli.java is missing apache license header, please get that fixed before committing. > Ozone : add support to DEBUG CLI for ksm.db > --- > > Key: HDFS-12187 > URL: https://issues.apache.org/jira/browse/HDFS-12187 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-12187-HDFS-7240.001.patch > > > This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite > db. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12115) Ozone: SCM: Add queryNode RPC Call
[ https://issues.apache.org/jira/browse/HDFS-12115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097043#comment-16097043 ] Weiwei Yang commented on HDFS-12115: Hi [~anu] Guess you need to rebase you latest patch to latest code base. :P > Ozone: SCM: Add queryNode RPC Call > -- > > Key: HDFS-12115 > URL: https://issues.apache.org/jira/browse/HDFS-12115 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-7240 > > Attachments: HDFS-12115-HDFS-7240.001.patch, > HDFS-12115-HDFS-7240.002.patch, HDFS-12115-HDFS-7240.003.patch, > HDFS-12115-HDFS-7240.004.patch, HDFS-12115-HDFS-7240.005.patch, > HDFS-12115-HDFS-7240.006.patch, HDFS-12115-HDFS-7240.007.patch, > HDFS-12115-HDFS-7240.008.patch > > > Add queryNode RPC to Storage container location protocol. This allows > applications like SCM CLI to get the list of nodes in various states, like > Healthy, live or Dead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12163) Ozone: MiniOzoneCluster uses 400+ threads
[ https://issues.apache.org/jira/browse/HDFS-12163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12163: --- Issue Type: Sub-task (was: Bug) Parent: HDFS-7240 > Ozone: MiniOzoneCluster uses 400+ threads > - > > Key: HDFS-12163 > URL: https://issues.apache.org/jira/browse/HDFS-12163 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, test >Reporter: Tsz Wo Nicholas Sze >Assignee: Weiwei Yang > Attachments: TestOzoneThreadCount20170719.patch > > > Checked the number of active threads used in MiniOzoneCluster with various > settings: > - Local handlers > - Distributed handlers > - Ratis-Netty > - Ratis-gRPC > The results are similar for all the settings. It uses 400+ threads for an > 1-datanode MiniOzoneCluster. > Moreover, there is a thread leak -- a number of the threads do not shutdown > after the test is finished. Therefore, when tests run consecutively, the > later tests use more threads. > Will post the details in comments. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background
[ https://issues.apache.org/jira/browse/HDFS-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12196: --- Description: Implement a recycling service running on datanode to delete stale blocks. The recycling service scans staled blocks for each container and delete chunks and references periodically. (was: Implement a recycling service running on datanode to delete stale blocks periodically. ) > Ozone: DeleteKey-2: Implement container recycling service to delete stale > blocks at background > -- > > Key: HDFS-12196 > URL: https://issues.apache.org/jira/browse/HDFS-12196 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > Implement a recycling service running on datanode to delete stale blocks. > The recycling service scans staled blocks for each container and delete > chunks and references periodically. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks
[ https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11922: --- Attachment: Async delete keys.pdf > Ozone: KSM: Garbage collect deleted blocks > -- > > Key: HDFS-11922 > URL: https://issues.apache.org/jira/browse/HDFS-11922 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Weiwei Yang >Priority: Critical > Attachments: Async delete keys.pdf > > > We need to garbage collect deleted blocks from the Datanodes. There are two > cases where we will have orphaned blocks. One is like the classical HDFS, > where someone deletes a key and we need to delete the corresponding blocks. > Another case, is when someone overwrites a key -- an overwrite can be treated > as a delete and a new put -- that means that older blocks need to be GC-ed at > some point of time. > Couple of JIRAs has discussed this in one form or another -- so consolidating > all those discussions in this JIRA. > HDFS-11796 -- needs to fix this issue for some tests to pass > HDFS-11780 -- changed the old overwriting behavior to not supporting this > feature for time being. > HDFS-11920 - Once again runs into this issue when user tries to put an > existing key. > HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on > GC for Datanodes. > When we solve this issue, we should also consider 2 more aspects. > One, we support versioning in the buckets, tracking which blocks are really > orphaned is something that KSM will do. So delete and overwrite at some point > needs to decide how to handle versioning of buckets. > Two, If a key exists in a closed container, then it is immutable, hence the > strategy of removing the key might be more complex than just talking to an > open container. > cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], > [~szetszwo], [~nandakumar131] > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11922) Ozone: KSM: Garbage collect deleted blocks
[ https://issues.apache.org/jira/browse/HDFS-11922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099814#comment-16099814 ] Weiwei Yang commented on HDFS-11922: Hi [~anu], [~xyao] and folks on cc, I have uploaded a doc about delete key implementation based on the discussions we had earlier, please help to review. Thanks! > Ozone: KSM: Garbage collect deleted blocks > -- > > Key: HDFS-11922 > URL: https://issues.apache.org/jira/browse/HDFS-11922 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Weiwei Yang >Priority: Critical > Attachments: Async delete keys.pdf > > > We need to garbage collect deleted blocks from the Datanodes. There are two > cases where we will have orphaned blocks. One is like the classical HDFS, > where someone deletes a key and we need to delete the corresponding blocks. > Another case, is when someone overwrites a key -- an overwrite can be treated > as a delete and a new put -- that means that older blocks need to be GC-ed at > some point of time. > Couple of JIRAs has discussed this in one form or another -- so consolidating > all those discussions in this JIRA. > HDFS-11796 -- needs to fix this issue for some tests to pass > HDFS-11780 -- changed the old overwriting behavior to not supporting this > feature for time being. > HDFS-11920 - Once again runs into this issue when user tries to put an > existing key. > HDFS-11781 - delete key API in KSM only deletes the metadata -- and relies on > GC for Datanodes. > When we solve this issue, we should also consider 2 more aspects. > One, we support versioning in the buckets, tracking which blocks are really > orphaned is something that KSM will do. So delete and overwrite at some point > needs to decide how to handle versioning of buckets. > Two, If a key exists in a closed container, then it is immutable, hence the > strategy of removing the key might be more complex than just talking to an > open container. > cc : [~xyao], [~cheersyang], [~vagarychen], [~msingh], [~yuanbo], > [~szetszwo], [~nandakumar131] > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background
[ https://issues.apache.org/jira/browse/HDFS-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-12196 started by Weiwei Yang. -- > Ozone: DeleteKey-2: Implement container recycling service to delete stale > blocks at background > -- > > Key: HDFS-12196 > URL: https://issues.apache.org/jira/browse/HDFS-12196 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > Implement a recycling service running on datanode to delete stale blocks > periodically. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously
Weiwei Yang created HDFS-12195: -- Summary: Ozone: DeleteKey-1: KSM replies delete key request asynchronously Key: HDFS-12195 URL: https://issues.apache.org/jira/browse/HDFS-12195 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Weiwei Yang Assignee: Yuanbo Liu We will implement delete key in ozone in multiple child tasks, this is 1 of the child task to implement client to scm communication. We need to do it in async manner, once key state is changed in ksm metadata, ksm is ready to reply client with a successful message. Actual deletes on other layers will happen some time later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12195) Ozone: DeleteKey-1: KSM replies delete key request asynchronously
[ https://issues.apache.org/jira/browse/HDFS-12195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12195: --- Attachment: client-ksm.png > Ozone: DeleteKey-1: KSM replies delete key request asynchronously > - > > Key: HDFS-12195 > URL: https://issues.apache.org/jira/browse/HDFS-12195 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Weiwei Yang >Assignee: Yuanbo Liu > Attachments: client-ksm.png > > > We will implement delete key in ozone in multiple child tasks, this is 1 of > the child task to implement client to scm communication. We need to do it in > async manner, once key state is changed in ksm metadata, ksm is ready to > reply client with a successful message. Actual deletes on other layers will > happen some time later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12196) Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background
Weiwei Yang created HDFS-12196: -- Summary: Ozone: DeleteKey-2: Implement container recycling service to delete stale blocks at background Key: HDFS-12196 URL: https://issues.apache.org/jira/browse/HDFS-12196 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Weiwei Yang Assignee: Weiwei Yang Implement a recycling service running on datanode to delete stale blocks periodically. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db
[ https://issues.apache.org/jira/browse/HDFS-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099362#comment-16099362 ] Weiwei Yang commented on HDFS-12187: +1, I am going to commit this shortly. Thanks [~vagarychen]. > Ozone : add support to DEBUG CLI for ksm.db > --- > > Key: HDFS-12187 > URL: https://issues.apache.org/jira/browse/HDFS-12187 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-12187-HDFS-7240.001.patch, > HDFS-12187-HDFS-7240.002.patch > > > This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite > db. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12145) Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names
[ https://issues.apache.org/jira/browse/HDFS-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12145: --- Attachment: HDFS-12145-HDFS-7240.006.patch Hi [~msingh] Apologies I might not comment clearly, I uploaded a v6 patch based on your v5 patch. Basically I wanted to get both non-delimited and delimited keys are covered by {{TestKeys}} class, please check and let me know if this looks good to you. Thanks a lot. > Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names > -- > > Key: HDFS-12145 > URL: https://issues.apache.org/jira/browse/HDFS-12145 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Fix For: HDFS-7240 > > Attachments: HDFS-12145-HDFS-7240.001.patch, > HDFS-12145-HDFS-7240.002.patch, HDFS-12145-HDFS-7240.003.patch, > HDFS-12145-HDFS-7240.004.patch, HDFS-12145-HDFS-7240.005.patch, > HDFS-12145-HDFS-7240.006.patch > > > With OzoneFileSystem, key names will be delimited by "/" which is used as the > path separator. > Support should be added in KSM and Ozone to support key name with "/" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12155) Ozone : add RocksDB support to DEBUG CLI
[ https://issues.apache.org/jira/browse/HDFS-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099377#comment-16099377 ] Weiwei Yang commented on HDFS-12155: Hi [~vagarychen], I just committed HDFS-12187, could you resume your patch for this one? Thanks > Ozone : add RocksDB support to DEBUG CLI > > > Key: HDFS-12155 > URL: https://issues.apache.org/jira/browse/HDFS-12155 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-12155-HDFS-7240.001.patch, > HDFS-12155-HDFS-7240.002.patch > > > As we are migrating to replacing LevelDB with RocksDB, we should also add the > support of RocksDB to the debug cli. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12187) Ozone : add support to DEBUG CLI for ksm.db
[ https://issues.apache.org/jira/browse/HDFS-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12187: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Target Version/s: HDFS-7240 Status: Resolved (was: Patch Available) I just committed this to the feature branch, thanks a lot for the contribution [~vagarychen], and thanks for the review [~anu]. > Ozone : add support to DEBUG CLI for ksm.db > --- > > Key: HDFS-12187 > URL: https://issues.apache.org/jira/browse/HDFS-12187 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang > Fix For: HDFS-7240 > > Attachments: HDFS-12187-HDFS-7240.001.patch, > HDFS-12187-HDFS-7240.002.patch > > > This JIRA adds the ability to convert ksm meta data file (ksm.db) into sqlite > db. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11984) Ozone: Ensures listKey lists all required key fields
[ https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned HDFS-11984: -- Assignee: Yiqun Lin (was: Weiwei Yang) > Ozone: Ensures listKey lists all required key fields > > > Key: HDFS-11984 > URL: https://issues.apache.org/jira/browse/HDFS-11984 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Yiqun Lin > > HDFS-11782 implements the listKey operation which only lists the basic key > fields, we need to make sure it return all required fields > # version > # md5hash > # createdOn > # size > # keyName > this task is depending on the work of HDFS-11886. See more discussion [here | > https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11984) Ozone: Ensures listKey lists all required key fields
[ https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099422#comment-16099422 ] Weiwei Yang commented on HDFS-11984: Hi [~linyiqun] Thanks for working on this. You are right, we don't need {{dataFileName}}, let me update the description. I listed this one depending on HDFS-11886 that was because I thought these info would be persisted only when we commit key (phase-2). However HDFS-12170 was implemented while writing a key (phase-1), it should be fine for now. We can keep HDFS-11886 open for further improvement on this. Meanwhile I will reassign this JIRA to you so you can work on this stuff end-to-end, thanks a lot for working on this, again. :). > Ozone: Ensures listKey lists all required key fields > > > Key: HDFS-11984 > URL: https://issues.apache.org/jira/browse/HDFS-11984 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > HDFS-11782 implements the listKey operation which only lists the basic key > fields, we need to make sure it return all required fields > # version > # md5hash > # createdOn > # size > # keyName > # dataFileName > this task is depending on the work of HDFS-11886. See more discussion [here | > https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11984) Ozone: Ensures listKey lists all required key fields
[ https://issues.apache.org/jira/browse/HDFS-11984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-11984: --- Description: HDFS-11782 implements the listKey operation which only lists the basic key fields, we need to make sure it return all required fields # version # md5hash # createdOn # size # keyName this task is depending on the work of HDFS-11886. See more discussion [here | https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. was: HDFS-11782 implements the listKey operation which only lists the basic key fields, we need to make sure it return all required fields # version # md5hash # createdOn # size # keyName # dataFileName this task is depending on the work of HDFS-11886. See more discussion [here | https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. > Ozone: Ensures listKey lists all required key fields > > > Key: HDFS-11984 > URL: https://issues.apache.org/jira/browse/HDFS-11984 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > HDFS-11782 implements the listKey operation which only lists the basic key > fields, we need to make sure it return all required fields > # version > # md5hash > # createdOn > # size > # keyName > this task is depending on the work of HDFS-11886. See more discussion [here | > https://issues.apache.org/jira/browse/HDFS-11782?focusedCommentId=16045562=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16045562]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11920) Ozone : add key partition
[ https://issues.apache.org/jira/browse/HDFS-11920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099438#comment-16099438 ] Weiwei Yang commented on HDFS-11920: Hi [~vagarychen] Thanks for the patch, it looks good to me overall. I have few comments please let me know if that makes sense to you, 1. *DistributedStorageHandler* line 410: I am wondering why it is building the containerKey to "/volume/bucket/blockID", why not use simply {{BlockID}} here? This seems to be the key that written to container.db in container metadata. 2. *ChunkOutputStream* I am thinking if we really need to let it know about an ozone object key, see line 56. Right now it writes a chunk file like {{ozoneKeyName_stream_streamId_chunk_n}}, why not {{blockId_stream_streamId_chunk_n}} instead? I think we can remove this variable from this class. line 168: it writes {{b}} length to the outputstream but the position only moves 1, seems incorrect. 3. *TestMultipleContainerReadWrite* In {{TestWriteRead}}, can we check the number of chunk files for the key actually matches the desired number of split? 4. Looks like chunk group input or output stream maintains a list of streams and r/w in liner manner, can we optimize this to do parallel r/w as they are independent chunks. That says to have a thread fetch a certain length of content from a chunk, then merge them together afterwards. It doesn't have to be done in this patch, but I think that might be a good improvement. Thanks > Ozone : add key partition > - > > Key: HDFS-11920 > URL: https://issues.apache.org/jira/browse/HDFS-11920 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-11920-HDFS-7240.001.patch, > HDFS-11920-HDFS-7240.002.patch, HDFS-11920-HDFS-7240.003.patch, > HDFS-11920-HDFS-7240.004.patch > > > Currently, each key corresponds to one single SCM block, and putKey/getKey > writes/reads to this single SCM block. This works fine for keys with > reasonably small data size. However if the data is too huge, (e.g. not even > fits into a single container), then we need to be able to partition the key > data into multiple blocks, each in one container. This JIRA changes the > key-related classes to support this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12145) Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names
[ https://issues.apache.org/jira/browse/HDFS-12145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102693#comment-16102693 ] Weiwei Yang commented on HDFS-12145: Thanks [~msingh] for confirming that, +1 to latest patch, I will commit this shortly. Thanks for the updates. > Ozone: OzoneFileSystem: Ozone & KSM should support "/" delimited key names > -- > > Key: HDFS-12145 > URL: https://issues.apache.org/jira/browse/HDFS-12145 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Fix For: HDFS-7240 > > Attachments: HDFS-12145-HDFS-7240.001.patch, > HDFS-12145-HDFS-7240.002.patch, HDFS-12145-HDFS-7240.003.patch, > HDFS-12145-HDFS-7240.004.patch, HDFS-12145-HDFS-7240.005.patch, > HDFS-12145-HDFS-7240.006.patch, HDFS-12145-HDFS-7240.007.patch > > > With OzoneFileSystem, key names will be delimited by "/" which is used as the > path separator. > Support should be added in KSM and Ozone to support key name with "/" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org