[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2018-02-11 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360400#comment-16360400
 ] 

He Xiaoqiao commented on HDFS-10453:


I checked the failed UTs and tested locally, it seems to work fine and might 
not relate to this patch, please double check at your convenience. 
[~arpitagarwal]

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 2.7.6
>
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, 
> HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, 
> HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, 
> HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, 
> HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, 
> HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, 
> HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, 
> HDFS-10453-trunk.002.patch, HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenario:
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can mostly be avoided by skip chooseTarget for 

[jira] [Commented] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be logged on ERROR level

2018-02-11 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360382#comment-16360382
 ] 

Mukul Kumar Singh commented on HDFS-13133:
--

Thanks for working on this [~elek].

+1, v1 patch looks good to me. I will commit this shortly.

> Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be 
> logged on ERROR level
> 
>
> Key: HDFS-13133
> URL: https://issues.apache.org/jira/browse/HDFS-13133
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13133-HDFS-7240.001.patch
>
>
> During the test of OzoneFileSystem with spark I noticed ERROR messages 
> multiple times:
> Something like this:
> {code}
> 2018-02-11 15:54:54 ERROR OzoneFileSystem:409 - Couldn't delete 
> o3://bucket1.test/user/hadoop/.sparkStaging/application_1518349702045_0008 - 
> does not exist
> {code}
> I checked the other implemetations, and they use DEBUG level. I think it's 
> expected that the path sometimes points to a non-existing dir/file.
> To be consistent with the other implemetation I propose to lower the log 
> level to debug.
> Examples from other file systems:
> S3AFileSystem:
> {code}
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", f);
>   instrumentation.errorIgnored();
>   return false;
> } catch (AmazonClientException e) {
>   throw translateException("delete", f, e);
> }
> {code}
> Alyun:
> {code}
>try {
>   return innerDelete(getFileStatus(path), recursive);
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", path);
>   return false;
> }
> {code}
> SFTP:
> {code}
>} catch (FileNotFoundException e) {
>   // file not found, no need to delete, return true
>   return false;
> }
> {code}
> SwiftNativeFileSystem:
> {code}
> try {
>   return store.delete(path, recursive);
> } catch (FileNotFoundException e) {
>   //base path was not found.
>   return false;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12636) Ozone: OzoneFileSystem: Implement seek functionality for rpc client

2018-02-11 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-12636:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Ozone: OzoneFileSystem: Implement seek functionality for rpc client
> ---
>
> Key: HDFS-12636
> URL: https://issues.apache.org/jira/browse/HDFS-12636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12636-HDFS-7240.001.patch, 
> HDFS-12636-HDFS-7240.002.patch, HDFS-12636-HDFS-7240.003.patch, 
> HDFS-12636-HDFS-7240.004.patch, HDFS-12636-HDFS-7240.005.patch, 
> HDFS-12636-HDFS-7240.006.patch, HDFS-12636-HDFS-7240.007.patch
>
>
> OzoneClient library provides a method to invoke both RPC as well as REST 
> based methods to ozone. This api will help in the improving both the 
> performance as well as the interface management in OzoneFileSystem.
> This jira will be used to convert the REST based calls to use this new 
> unified client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12636) Ozone: OzoneFileSystem: Implement seek functionality for rpc client

2018-02-11 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360381#comment-16360381
 ] 

Mukul Kumar Singh commented on HDFS-12636:
--

Thanks for the contribution [~ljain], I have committed this to the feature 
branch.

> Ozone: OzoneFileSystem: Implement seek functionality for rpc client
> ---
>
> Key: HDFS-12636
> URL: https://issues.apache.org/jira/browse/HDFS-12636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12636-HDFS-7240.001.patch, 
> HDFS-12636-HDFS-7240.002.patch, HDFS-12636-HDFS-7240.003.patch, 
> HDFS-12636-HDFS-7240.004.patch, HDFS-12636-HDFS-7240.005.patch, 
> HDFS-12636-HDFS-7240.006.patch, HDFS-12636-HDFS-7240.007.patch
>
>
> OzoneClient library provides a method to invoke both RPC as well as REST 
> based methods to ozone. This api will help in the improving both the 
> performance as well as the interface management in OzoneFileSystem.
> This jira will be used to convert the REST based calls to use this new 
> unified client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13132) Ozone: Handle datanode failures in Storage Container Manager

2018-02-11 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13132:
-
Attachment: HDFS-13132-HDFS-7240.002.patch

> Ozone: Handle datanode failures in Storage Container Manager
> 
>
> Key: HDFS-13132
> URL: https://issues.apache.org/jira/browse/HDFS-13132
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13132-HDFS-7240.001.patch, 
> HDFS-13132-HDFS-7240.002.patch
>
>
> Currently SCM receives heartbeat from the datanodes in the cluster receiving 
> container reports. Apart from this Ratis leader also receives the heartbeats 
> from the nodes in a Raft ring. The ratis heartbeats are at a smaller interval 
> (500 ms) whereas SCM heartbeats are at (30s), it is thereby considered safe 
> to assume that a datanode is really lost when SCM missed heartbeat from such 
> a node.
> The pipeline recovery will follow the following steps
> 1) As noted earlier, SCM will identify a dead DN via the heartbeats. Current 
> stale interval is (1.5m). Once a stale node has been identified, SCM will 
> find the list of containers for the pipelines the datanode was part of.
> 2) SCM sends close container command to the datanodes, note that at this 
> time, the Ratis ring has 2 nodes in the ring and consistency can still be 
> guaranteed by Ratis.
> 3) If another node dies before the close container command succeeded, then 
> ratis cannot guarantee consistency of the data being written/ close 
> container. The pipeline here will be marked in a inconsistent state.
> 4) Closed container will be replicated via the close container replication 
> protocol.
> If the dead datanode comes back, as part of the re-register command, SCM will 
> ask the Datanode to format all the open containers.
> 5) Return the healthy nodes back to the free node pool for the next pipeline 
> allocation
> 6) Read operation to close containers will succeed however read operation to 
> a open container on a single node cluster will be disallowed. It will only be 
> allowed under a special flag aka ReadInconsistentData flag.
> This jira will introduce the mechanism to identify and handle datanode 
> failure.
> However handling of a) 2 nodes simultaneously and b) Return the nodes to 
> healthy state c) allow inconsistent data reads and d) purging of open 
> container on a zombie node will be done as part of separate bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-11 Thread lindongdong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359766#comment-16359766
 ] 

lindongdong edited comment on HDFS-8693 at 2/12/18 3:59 AM:


I meet some errors about this patch.

If the cluster has 3 nodes: A, B, C, and the NNs is in A, B.

When we remove B, and install a new SNN in C, all DNs fail to register to the 
new SNN. Error like the below:
{code:java}
2018-02-09 19:49:02,728 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/x.x.x.x:25006 | Problem connecting to server: 
189-219-255-103/x.x.x.x:25006 | BPServiceActor.java:197
2018-02-09 19:49:07,731 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/x.x.x.x:25006 | Exception encountered while 
connecting to the server : javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)] | Client.java:726
{code}


was (Author: lindongdong):
I meet some errors about this patch.

If the cluster has 3 nodes: A, B, C, and the NNs is in A, B.

When we remove B, and install a new SNN in C, all DNs fail to register to the 
new SNN. Error like the below:
{code:java}
2018-02-09 19:49:02,728 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/10.219.255.103:25006 | Problem connecting to 
server: 189-219-255-103/10.219.255.103:25006 | BPServiceActor.java:197
2018-02-09 19:49:07,731 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/10.219.255.103:25006 | Exception encountered 
while connecting to the server : javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)] | Client.java:726
{code}

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693.02.patch, HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the 

[jira] [Comment Edited] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-11 Thread lindongdong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359766#comment-16359766
 ] 

lindongdong edited comment on HDFS-8693 at 2/12/18 3:58 AM:


I meet some errors about this patch.

If the cluster has 3 nodes: A, B, C, and the NNs is in A, B.

When we remove B, and install a new SNN in C, all DNs fail to register to the 
new SNN. Error like the below:
{code:java}
2018-02-09 19:49:02,728 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/10.219.255.103:25006 | Problem connecting to 
server: 189-219-255-103/10.219.255.103:25006 | BPServiceActor.java:197
2018-02-09 19:49:07,731 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/10.219.255.103:25006 | Exception encountered 
while connecting to the server : javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)] | Client.java:726
{code}


was (Author: lindongdong):
I meet some errors about this patch.

If the cluster has 3 nodes: A, B, C, and the NNs is in A, B.

When we remove B, and install a new SNN in C, all DNs fail to register to the 
new SNN. Error like the below:
{code:java}
2018-02-09 19:49:02,728 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/189.219.255.103:25006 | Problem connecting to 
server: 189-219-255-103/189.219.255.103:25006 | BPServiceActor.java:197
2018-02-09 19:49:07,731 | WARN | DataNode: 
[[[DISK]file:/_1/b-b_2/bb_3/b_4/b-5/B-2/B-3/B-4/bbb-b/hadoop/data1/dn/]]
 heartbeating to 189-219-255-103/189.219.255.103:25006 | Exception encountered 
while connecting to the server : javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)] | Client.java:726
{code}

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693.02.patch, HDFS-8693.03.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready 

[jira] [Updated] (HDFS-13001) Testcase improvement for DFSAdmin

2018-02-11 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-13001:

Component/s: test

> Testcase improvement for DFSAdmin
> -
>
> Key: HDFS-13001
> URL: https://issues.apache.org/jira/browse/HDFS-13001
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test, tools
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Attachments: HDFS-13001.001.patch, HDFS-13001.002.patch
>
>
> Testcase improvement for DFSAdmin command. The commands should be tested 
> under following environments:
> (1) Both Namenode are up online
> (2) NN1 is off offline and NN2 is up online
> (3) NN1 is up online and NN2 is down offline
> (4) Both Namenode are down offline
> The testcases can be improved.
> Testcases can be improved like code below.
> {code:java}
>   private void testExecuteDFSAdminCommand(int nnIndex, String[] command,
>   String message) throws Exception {
> setUpHaCluster(false);
> switch (nnIndex) {
>   case 0:
> cluster.getDfsCluster().shutdownNameNode(0);
> cluster.getDfsCluster().transitionToActive(1);
> break;
>   case 1:
> cluster.getDfsCluster().shutdownNameNode(1);
> cluster.getDfsCluster().transitionToActive(0);
> break;
>   case 2:
> cluster.getDfsCluster().shutdownNameNode(0);
> cluster.getDfsCluster().shutdownNameNode(1);
> break;
>   default:
> }
> int exitCode = admin.run(command);
> if (nnIndex != 2) {
>   assertEquals(err.toString().trim(), 0, exitCode);
>   assertOutputMatches(message + newLine);
> } else {
>   assertNotEquals(err.toString().trim(), 0, exitCode);
>   assertOutputNotMatches(message + newLine);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13001) Testcase improvement for DFSAdmin

2018-02-11 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-13001:

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HDFS-12935)

> Testcase improvement for DFSAdmin
> -
>
> Key: HDFS-13001
> URL: https://issues.apache.org/jira/browse/HDFS-13001
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Attachments: HDFS-13001.001.patch, HDFS-13001.002.patch
>
>
> Testcase improvement for DFSAdmin command. The commands should be tested 
> under following environments:
> (1) Both Namenode are up online
> (2) NN1 is off offline and NN2 is up online
> (3) NN1 is up online and NN2 is down offline
> (4) Both Namenode are down offline
> The testcases can be improved.
> Testcases can be improved like code below.
> {code:java}
>   private void testExecuteDFSAdminCommand(int nnIndex, String[] command,
>   String message) throws Exception {
> setUpHaCluster(false);
> switch (nnIndex) {
>   case 0:
> cluster.getDfsCluster().shutdownNameNode(0);
> cluster.getDfsCluster().transitionToActive(1);
> break;
>   case 1:
> cluster.getDfsCluster().shutdownNameNode(1);
> cluster.getDfsCluster().transitionToActive(0);
> break;
>   case 2:
> cluster.getDfsCluster().shutdownNameNode(0);
> cluster.getDfsCluster().shutdownNameNode(1);
> break;
>   default:
> }
> int exitCode = admin.run(command);
> if (nnIndex != 2) {
>   assertEquals(err.toString().trim(), 0, exitCode);
>   assertOutputMatches(message + newLine);
> } else {
>   assertNotEquals(err.toString().trim(), 0, exitCode);
>   assertOutputNotMatches(message + newLine);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-11 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360202#comment-16360202
 ] 

Wei-Chiu Chuang commented on HDFS-11187:


Regarding the rev002 of the branch-2 patch – the code in 
{{FsVolumeImpl#addFinalizedBlock()}} meant to copy the in-memory checksum of 
rbw replica to the in-memory checksum of finalized replica. Your patch forces 
the finalized replica to load checksum from disk and that could cause extra 
disk access.

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-11 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360199#comment-16360199
 ] 

Wei-Chiu Chuang commented on HDFS-11187:


Please move the code in FsVolumeImpl#addFinalizedBlock() in the trunk patch to 
FsDatasetImpl#finalizeReplica() in the branch-2 patch.

Looks like you can remove getLastChecksumAndDataLen from FinalizedReplica – It 
is replaced by getPartialChunkChecksumForFinalized in BlockSender

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13132) Ozone: Handle datanode failures in Storage Container Manager

2018-02-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360072#comment-16360072
 ] 

genericqa commented on HDFS-13132:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 
34s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
28s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
24s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
30s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
43s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  4m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 22s{color} | {color:orange} hadoop-hdfs-project: The patch generated 8 new + 
10 unchanged - 0 fixed = 18 total (was 10) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
8s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}139m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}267m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.ozone.ozShell.TestOzoneShell |
|   | hadoop.ozone.failure.TestRatisFailure |
|   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.ozone.web.TestLocalOzoneVolumes |
|   | hadoop.ozone.scm.TestXceiverClientManager |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
|   | hadoop.ozone.scm.TestSCMMXBean |
|   | hadoop.ozone.container.common.TestDatanodeStateMachine |
|   | hadoop.ozone.TestContainerOperations |
|   | 

[jira] [Commented] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be logged on ERROR level

2018-02-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360022#comment-16360022
 ] 

genericqa commented on HDFS-13133:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
22s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
26s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
36s{color} | {color:green} hadoop-ozone in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13133 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910120/HDFS-13133-HDFS-7240.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 84eeae27f171 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-7240 / eb5e66a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23028/testReport/ |
| Max. process+thread count | 432 (vs. ulimit of 5500) |
| modules | C: hadoop-tools/hadoop-ozone U: hadoop-tools/hadoop-ozone |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23028/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: OzoneFileSystem: 

[jira] [Commented] (HDFS-12636) Ozone: OzoneFileSystem: Implement seek functionality for rpc client

2018-02-11 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360020#comment-16360020
 ] 

Mukul Kumar Singh commented on HDFS-12636:
--

Thanks for updating the patch [~ljain].
+1, v7 patch looks good to me.

> Ozone: OzoneFileSystem: Implement seek functionality for rpc client
> ---
>
> Key: HDFS-12636
> URL: https://issues.apache.org/jira/browse/HDFS-12636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12636-HDFS-7240.001.patch, 
> HDFS-12636-HDFS-7240.002.patch, HDFS-12636-HDFS-7240.003.patch, 
> HDFS-12636-HDFS-7240.004.patch, HDFS-12636-HDFS-7240.005.patch, 
> HDFS-12636-HDFS-7240.006.patch, HDFS-12636-HDFS-7240.007.patch
>
>
> OzoneClient library provides a method to invoke both RPC as well as REST 
> based methods to ozone. This api will help in the improving both the 
> performance as well as the interface management in OzoneFileSystem.
> This jira will be used to convert the REST based calls to use this new 
> unified client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12636) Ozone: OzoneFileSystem: Implement seek functionality for rpc client

2018-02-11 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359993#comment-16359993
 ] 

Elek, Marton commented on HDFS-12636:
-

After this patch ObjectStore will use rpc client which starts additional nio 
threads. Without proper file system close there will be opened threads in the 
client application. Unfortunatelly ObjectStore is not closeable (but proxy 
field of ObjectStore should be closed).

I propose to create a follow-up Jira to make ObjectStore closeable and close it 
from OzoneFileSystem.

What do you think?

> Ozone: OzoneFileSystem: Implement seek functionality for rpc client
> ---
>
> Key: HDFS-12636
> URL: https://issues.apache.org/jira/browse/HDFS-12636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-12636-HDFS-7240.001.patch, 
> HDFS-12636-HDFS-7240.002.patch, HDFS-12636-HDFS-7240.003.patch, 
> HDFS-12636-HDFS-7240.004.patch, HDFS-12636-HDFS-7240.005.patch, 
> HDFS-12636-HDFS-7240.006.patch, HDFS-12636-HDFS-7240.007.patch
>
>
> OzoneClient library provides a method to invoke both RPC as well as REST 
> based methods to ozone. This api will help in the improving both the 
> performance as well as the interface management in OzoneFileSystem.
> This jira will be used to convert the REST based calls to use this new 
> unified client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be logged on ERROR level

2018-02-11 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13133:

Status: Patch Available  (was: Open)

> Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be 
> logged on ERROR level
> 
>
> Key: HDFS-13133
> URL: https://issues.apache.org/jira/browse/HDFS-13133
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13133-HDFS-7240.001.patch
>
>
> During the test of OzoneFileSystem with spark I noticed ERROR messages 
> multiple times:
> Something like this:
> {code}
> 2018-02-11 15:54:54 ERROR OzoneFileSystem:409 - Couldn't delete 
> o3://bucket1.test/user/hadoop/.sparkStaging/application_1518349702045_0008 - 
> does not exist
> {code}
> I checked the other implemetations, and they use DEBUG level. I think it's 
> expected that the path sometimes points to a non-existing dir/file.
> To be consistent with the other implemetation I propose to lower the log 
> level to debug.
> Examples from other file systems:
> S3AFileSystem:
> {code}
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", f);
>   instrumentation.errorIgnored();
>   return false;
> } catch (AmazonClientException e) {
>   throw translateException("delete", f, e);
> }
> {code}
> Alyun:
> {code}
>try {
>   return innerDelete(getFileStatus(path), recursive);
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", path);
>   return false;
> }
> {code}
> SFTP:
> {code}
>} catch (FileNotFoundException e) {
>   // file not found, no need to delete, return true
>   return false;
> }
> {code}
> SwiftNativeFileSystem:
> {code}
> try {
>   return store.delete(path, recursive);
> } catch (FileNotFoundException e) {
>   //base path was not found.
>   return false;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be logged on ERROR level

2018-02-11 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13133:

Summary: Ozone: OzoneFileSystem: Calling delete with non-existing path 
shouldn't be logged on ERROR level  (was: Ozone: OzoneFileSystem: Calling 
delete with non-exsistent path shouldn't cause ERROR log message)

> Ozone: OzoneFileSystem: Calling delete with non-existing path shouldn't be 
> logged on ERROR level
> 
>
> Key: HDFS-13133
> URL: https://issues.apache.org/jira/browse/HDFS-13133
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13133-HDFS-7240.001.patch
>
>
> During the test of OzoneFileSystem with spark I noticed ERROR messages 
> multiple times:
> Something like this:
> {code}
> 2018-02-11 15:54:54 ERROR OzoneFileSystem:409 - Couldn't delete 
> o3://bucket1.test/user/hadoop/.sparkStaging/application_1518349702045_0008 - 
> does not exist
> {code}
> I checked the other implemetations, and they use DEBUG level. I think it's 
> expected that the path sometimes points to a non-existing dir/file.
> To be consistent with the other implemetation I propose to lower the log 
> level to debug.
> Examples from other file systems:
> S3AFileSystem:
> {code}
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", f);
>   instrumentation.errorIgnored();
>   return false;
> } catch (AmazonClientException e) {
>   throw translateException("delete", f, e);
> }
> {code}
> Alyun:
> {code}
>try {
>   return innerDelete(getFileStatus(path), recursive);
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", path);
>   return false;
> }
> {code}
> SFTP:
> {code}
>} catch (FileNotFoundException e) {
>   // file not found, no need to delete, return true
>   return false;
> }
> {code}
> SwiftNativeFileSystem:
> {code}
> try {
>   return store.delete(path, recursive);
> } catch (FileNotFoundException e) {
>   //base path was not found.
>   return false;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-exsistent path shouldn't cause ERROR log message

2018-02-11 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13133:

Attachment: HDFS-13133-HDFS-7240.001.patch

> Ozone: OzoneFileSystem: Calling delete with non-exsistent path shouldn't 
> cause ERROR log message
> 
>
> Key: HDFS-13133
> URL: https://issues.apache.org/jira/browse/HDFS-13133
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13133-HDFS-7240.001.patch
>
>
> During the test of OzoneFileSystem with spark I noticed ERROR messages 
> multiple times:
> Something like this:
> {code}
> 2018-02-11 15:54:54 ERROR OzoneFileSystem:409 - Couldn't delete 
> o3://bucket1.test/user/hadoop/.sparkStaging/application_1518349702045_0008 - 
> does not exist
> {code}
> I checked the other implemetations, and they use DEBUG level. I think it's 
> expected that the path sometimes points to a non-existing dir/file.
> To be consistent with the other implemetation I propose to lower the log 
> level to debug.
> Examples from other file systems:
> S3AFileSystem:
> {code}
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", f);
>   instrumentation.errorIgnored();
>   return false;
> } catch (AmazonClientException e) {
>   throw translateException("delete", f, e);
> }
> {code}
> Alyun:
> {code}
>try {
>   return innerDelete(getFileStatus(path), recursive);
> } catch (FileNotFoundException e) {
>   LOG.debug("Couldn't delete {} - does not exist", path);
>   return false;
> }
> {code}
> SFTP:
> {code}
>} catch (FileNotFoundException e) {
>   // file not found, no need to delete, return true
>   return false;
> }
> {code}
> SwiftNativeFileSystem:
> {code}
> try {
>   return store.delete(path, recursive);
> } catch (FileNotFoundException e) {
>   //base path was not found.
>   return false;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13133) Ozone: OzoneFileSystem: Calling delete with non-exsistent path shouldn't cause ERROR log message

2018-02-11 Thread Elek, Marton (JIRA)
Elek, Marton created HDFS-13133:
---

 Summary: Ozone: OzoneFileSystem: Calling delete with non-exsistent 
path shouldn't cause ERROR log message
 Key: HDFS-13133
 URL: https://issues.apache.org/jira/browse/HDFS-13133
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Elek, Marton
Assignee: Elek, Marton
 Fix For: HDFS-7240


During the test of OzoneFileSystem with spark I noticed ERROR messages multiple 
times:

Something like this:

{code}
2018-02-11 15:54:54 ERROR OzoneFileSystem:409 - Couldn't delete 
o3://bucket1.test/user/hadoop/.sparkStaging/application_1518349702045_0008 - 
does not exist
{code}

I checked the other implemetations, and they use DEBUG level. I think it's 
expected that the path sometimes points to a non-existing dir/file.

To be consistent with the other implemetation I propose to lower the log level 
to debug.


Examples from other file systems:

S3AFileSystem:

{code}
} catch (FileNotFoundException e) {
  LOG.debug("Couldn't delete {} - does not exist", f);
  instrumentation.errorIgnored();
  return false;
} catch (AmazonClientException e) {
  throw translateException("delete", f, e);
}
{code}


Alyun:

{code}
   try {
  return innerDelete(getFileStatus(path), recursive);
} catch (FileNotFoundException e) {
  LOG.debug("Couldn't delete {} - does not exist", path);
  return false;
}
{code}


SFTP:

{code}
   } catch (FileNotFoundException e) {
  // file not found, no need to delete, return true
  return false;
}
{code}

SwiftNativeFileSystem:

{code}
try {
  return store.delete(path, recursive);
} catch (FileNotFoundException e) {
  //base path was not found.
  return false;
}
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM

2018-02-11 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359985#comment-16359985
 ] 

Elek, Marton commented on HDFS-11699:
-

Hi, [~anu], thx to create this patch. High level I am happy with the aproach 
but I don't understand the details of the implementation.

1. In ContainerMapping.java/processContainerReport: I don't undestand the 
comments, but my impression is that the two method should be swaped:

{code}
  containerStore.put(dbKey, newState.toByteArray());

  if (!closed(newState)) {
LOG.info("Closing the Container: {}", newState.getContainerName());
  }

{code}

With the original order (containerStore.put is after the closed(newState)), the 
original state could overwrite the new changed state. Maybe I missed something, 
but I also tried to add 

{code}
Assert.assertEquals(LifeCycleState.CLOSING, 
mapping.getContainer(info.getContainerName()).getState());
{code}

to the end of TestContainerClose.testClose and it is failed.

2. In ContainerMapping.closed:

{code}
  if (containerUsedPercentage >= containerCloseThreshold) {
+  // We will call closer till get to the closed state.
+  closer.close(newState);
+  if (shouldClose(newState)) {
+// This event moves the Container from Open to Closing State.
+OzoneProtos.LifeCycleState state = updateContainerState(
+ContainerInfo.fromProtobuf(newState).getContainerName(),
+OzoneProtos.LifeCycleEvent.FINALIZE);

{code}

According to my understanding this code will send a close command even if the 
container is in CLOSED state. IMHO it should be sent only if the container in 
OPEN or CLOSING state.

3. It's not clear for me how the CLOSED state will be achieved, but maybe it's 
a task of a different jira.

4. javadoc of ContainerMapping.shouldClose is misleading. It returns false if 
the container is closed.

> Ozone:SCM: Add support for close containers in SCM
> --
>
> Key: HDFS-11699
> URL: https://issues.apache.org/jira/browse/HDFS-11699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
> Attachments: HDFS-11699-HDFS-7240.001.patch
>
>
> Add support for closed containers in SCM. When a container is closed, SCM 
> needs to make a set of decisions like which pool and which machines are 
> expected to have this container. SCM also needs to issue a copyContainer 
> command to the target datanodes so that these nodes can replicate data from 
> the original locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13022) Block Storage: Kubernetes dynamic persistent volume provisioner

2018-02-11 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13022:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7240
   Status: Resolved  (was: Patch Available)

Thanks for the contribution [~elek], I have committed this to the feature 
branch.

> Block Storage: Kubernetes dynamic persistent volume provisioner
> ---
>
> Key: HDFS-13022
> URL: https://issues.apache.org/jira/browse/HDFS-13022
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13022-HDFS-7240.001.patch, 
> HDFS-13022-HDFS-7240.002.patch, HDFS-13022-HDFS-7240.003.patch, 
> HDFS-13022-HDFS-7240.004.patch, HDFS-13022-HDFS-7240.005.patch, 
> HDFS-13022-HDFS-7240.006.patch, HDFS-13022-HDFS-7240.007.patch
>
>
> {color:#FF}{color}
> With HDFS-13017 and HDFS-13018 the cblock/jscsi server could be used in a 
> kubernetes cluster as the backend for iscsi persistent volumes.
> Unfortunatelly we need to create all the required cblocks manually with 'hdfs 
> cblok -c user volume...' for all the Persistent Volumes.
>  
> But it could be handled with a simple optional component. An additional 
> service could listen on the kubernetes event stream. In case of new 
> PersistentVolumeClaim (where the storageClassName is cblock) the cblock 
> server could create cblock in advance AND create the persistent volume could 
> be created.
>  
> The code is very simple, and this additional component could be optional in 
> the cblock server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13132) Ozone: Handle datanode failures in Storage Container Manager

2018-02-11 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13132:
-
Attachment: HDFS-13132-HDFS-7240.001.patch

> Ozone: Handle datanode failures in Storage Container Manager
> 
>
> Key: HDFS-13132
> URL: https://issues.apache.org/jira/browse/HDFS-13132
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13132-HDFS-7240.001.patch
>
>
> Currently SCM receives heartbeat from the datanodes in the cluster receiving 
> container reports. Apart from this Ratis leader also receives the heartbeats 
> from the nodes in a Raft ring. The ratis heartbeats are at a smaller interval 
> (500 ms) whereas SCM heartbeats are at (30s), it is thereby considered safe 
> to assume that a datanode is really lost when SCM missed heartbeat from such 
> a node.
> The pipeline recovery will follow the following steps
> 1) As noted earlier, SCM will identify a dead DN via the heartbeats. Current 
> stale interval is (1.5m). Once a stale node has been identified, SCM will 
> find the list of containers for the pipelines the datanode was part of.
> 2) SCM sends close container command to the datanodes, note that at this 
> time, the Ratis ring has 2 nodes in the ring and consistency can still be 
> guaranteed by Ratis.
> 3) If another node dies before the close container command succeeded, then 
> ratis cannot guarantee consistency of the data being written/ close 
> container. The pipeline here will be marked in a inconsistent state.
> 4) Closed container will be replicated via the close container replication 
> protocol.
> If the dead datanode comes back, as part of the re-register command, SCM will 
> ask the Datanode to format all the open containers.
> 5) Return the healthy nodes back to the free node pool for the next pipeline 
> allocation
> 6) Read operation to close containers will succeed however read operation to 
> a open container on a single node cluster will be disallowed. It will only be 
> allowed under a special flag aka ReadInconsistentData flag.
> This jira will introduce the mechanism to identify and handle datanode 
> failure.
> However handling of a) 2 nodes simultaneously and b) Return the nodes to 
> healthy state c) allow inconsistent data reads and d) purging of open 
> container on a zombie node will be done as part of separate bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13132) Ozone: Handle datanode failures in Storage Container Manager

2018-02-11 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-13132:
-
Status: Patch Available  (was: Open)

> Ozone: Handle datanode failures in Storage Container Manager
> 
>
> Key: HDFS-13132
> URL: https://issues.apache.org/jira/browse/HDFS-13132
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13132-HDFS-7240.001.patch
>
>
> Currently SCM receives heartbeat from the datanodes in the cluster receiving 
> container reports. Apart from this Ratis leader also receives the heartbeats 
> from the nodes in a Raft ring. The ratis heartbeats are at a smaller interval 
> (500 ms) whereas SCM heartbeats are at (30s), it is thereby considered safe 
> to assume that a datanode is really lost when SCM missed heartbeat from such 
> a node.
> The pipeline recovery will follow the following steps
> 1) As noted earlier, SCM will identify a dead DN via the heartbeats. Current 
> stale interval is (1.5m). Once a stale node has been identified, SCM will 
> find the list of containers for the pipelines the datanode was part of.
> 2) SCM sends close container command to the datanodes, note that at this 
> time, the Ratis ring has 2 nodes in the ring and consistency can still be 
> guaranteed by Ratis.
> 3) If another node dies before the close container command succeeded, then 
> ratis cannot guarantee consistency of the data being written/ close 
> container. The pipeline here will be marked in a inconsistent state.
> 4) Closed container will be replicated via the close container replication 
> protocol.
> If the dead datanode comes back, as part of the re-register command, SCM will 
> ask the Datanode to format all the open containers.
> 5) Return the healthy nodes back to the free node pool for the next pipeline 
> allocation
> 6) Read operation to close containers will succeed however read operation to 
> a open container on a single node cluster will be disallowed. It will only be 
> allowed under a special flag aka ReadInconsistentData flag.
> This jira will introduce the mechanism to identify and handle datanode 
> failure.
> However handling of a) 2 nodes simultaneously and b) Return the nodes to 
> healthy state c) allow inconsistent data reads and d) purging of open 
> container on a zombie node will be done as part of separate bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13132) Ozone: Handle datanode failures in Storage Container Manager

2018-02-11 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDFS-13132:


 Summary: Ozone: Handle datanode failures in Storage Container 
Manager
 Key: HDFS-13132
 URL: https://issues.apache.org/jira/browse/HDFS-13132
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh
 Fix For: HDFS-7240


Currently SCM receives heartbeat from the datanodes in the cluster receiving 
container reports. Apart from this Ratis leader also receives the heartbeats 
from the nodes in a Raft ring. The ratis heartbeats are at a smaller interval 
(500 ms) whereas SCM heartbeats are at (30s), it is thereby considered safe to 
assume that a datanode is really lost when SCM missed heartbeat from such a 
node.

The pipeline recovery will follow the following steps

1) As noted earlier, SCM will identify a dead DN via the heartbeats. Current 
stale interval is (1.5m). Once a stale node has been identified, SCM will find 
the list of containers for the pipelines the datanode was part of.

2) SCM sends close container command to the datanodes, note that at this time, 
the Ratis ring has 2 nodes in the ring and consistency can still be guaranteed 
by Ratis.

3) If another node dies before the close container command succeeded, then 
ratis cannot guarantee consistency of the data being written/ close container. 
The pipeline here will be marked in a inconsistent state.

4) Closed container will be replicated via the close container replication 
protocol.
If the dead datanode comes back, as part of the re-register command, SCM will 
ask the Datanode to format all the open containers.

5) Return the healthy nodes back to the free node pool for the next pipeline 
allocation

6) Read operation to close containers will succeed however read operation to a 
open container on a single node cluster will be disallowed. It will only be 
allowed under a special flag aka ReadInconsistentData flag.


This jira will introduce the mechanism to identify and handle datanode failure.
However handling of a) 2 nodes simultaneously and b) Return the nodes to 
healthy state c) allow inconsistent data reads and d) purging of open container 
on a zombie node will be done as part of separate bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359914#comment-16359914
 ] 

genericqa commented on HDFS-13108:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HDFS-13108 does not apply to HDFS-7240. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13108 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910090/HDFS-13108-HDFS-7240.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23026/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-11 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359912#comment-16359912
 ] 

Elek, Marton commented on HDFS-13108:
-

PS: The patch could be applied on the top of HDFS-12735.

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359913#comment-16359913
 ] 

genericqa commented on HDFS-11600:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 26 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 51s{color} 
| {color:red} hadoop-hdfs-project_hadoop-hdfs generated 3 new + 388 unchanged - 
3 fixed = 391 total (was 391) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 11 new + 0 unchanged - 0 fixed = 11 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 46s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-11600 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910082/HDFS-11600.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 049e0b378d9c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a08c048 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| javac | 

[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-11 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13108:

Status: Patch Available  (was: Open)

I uploaded the patch. The patch itself is very small but I also modified the 
unit test to run the same tests with and wihout the defaultfs settings.

Contract tests are also passing and I successfully  submitted a simple spark 
word count using o3 file system.

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-11 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13108:

Attachment: HDFS-13108-HDFS-7240.001.patch

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13131) Modifying testcase testEnableAndDisableErasureCodingPolicy

2018-02-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359884#comment-16359884
 ] 

genericqa commented on HDFS-13131:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 12s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13131 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910076/HDFS-13131.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2e0f6e6b8956 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a08c048 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23024/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23024/testReport/ |
| Max. process+thread count | 3028 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-13022) Block Storage: Kubernetes dynamic persistent volume provisioner

2018-02-11 Thread Mukul Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359883#comment-16359883
 ] 

Mukul Kumar Singh commented on HDFS-13022:
--

Thanks for updating the patch [~elek].

+1, v7 patch looks good to me.

> Block Storage: Kubernetes dynamic persistent volume provisioner
> ---
>
> Key: HDFS-13022
> URL: https://issues.apache.org/jira/browse/HDFS-13022
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: HDFS-7240
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13022-HDFS-7240.001.patch, 
> HDFS-13022-HDFS-7240.002.patch, HDFS-13022-HDFS-7240.003.patch, 
> HDFS-13022-HDFS-7240.004.patch, HDFS-13022-HDFS-7240.005.patch, 
> HDFS-13022-HDFS-7240.006.patch, HDFS-13022-HDFS-7240.007.patch
>
>
> {color:#FF}{color}
> With HDFS-13017 and HDFS-13018 the cblock/jscsi server could be used in a 
> kubernetes cluster as the backend for iscsi persistent volumes.
> Unfortunatelly we need to create all the required cblocks manually with 'hdfs 
> cblok -c user volume...' for all the Persistent Volumes.
>  
> But it could be handled with a simple optional component. An additional 
> service could listen on the kubernetes event stream. In case of new 
> PersistentVolumeClaim (where the storageClassName is cblock) the cblock 
> server could create cblock in advance AND create the persistent volume could 
> be created.
>  
> The code is very simple, and this additional component could be optional in 
> the cblock server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359850#comment-16359850
 ] 

SammiChen commented on HDFS-11600:
--

I uploaded a new patch based on Andrew's 002 patch. The idea is to separate 
{{TestDFSStripedOutputStreamWithFailure}} into 
{{TestDFSStripedOutputStreamWithFailureBase}} which carries common routines and 
variable defines, {{TestDFSStripedOutputStreamWithFailure}} which carries fixed 
parameter test cases, and {{TestDFSStripedOutputStreamWithFailureP}} which 
carries parameterized test case. In {{TestDFSStripedOutputStreamWithFailureP}}, 
I refine the current test case. Each time it will randomly choose 10 file 
length to run the test case. Given that the largest built-in EC policy 
currently support is RS-10-4-1MB, 10 rounds of same test case with random 1 
data node failure seems enough.  [~andrew.wang], would you take a look at the 
new patch at your convenient time?

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Attachment: HDFS-11600.003.patch

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Status: Patch Available  (was: Open)

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org