[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13008:
--
Attachment: HDFS-13008-HDFS-7240.003.patch

> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008-HDFS-7240.001.patch, 
> HDFS-13008-HDFS-7240.002.patch, HDFS-13008-HDFS-7240.003.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365207#comment-16365207
 ] 

genericqa commented on HDFS-13008:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
31s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
54s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
58s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
53s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
41s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 41s{color} | 
{color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 41s{color} 
| {color:red} hadoop-hdfs-project in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 
0 unchanged - 1 fixed = 2 total (was 1) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
12s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
41s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13008 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910685/HDFS-13008-HDFS-7240.002.patch
 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  javadoc  
mvninstall  shadedclient  findbugs  checkstyle  |
| uname | Linux c65b45fd6d40 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| 

[jira] [Updated] (HDFS-13130) Log object instance get incorrectly in SlowDiskTracker

2018-02-14 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13130:
-
Target Version/s: 3.2.0  (was: 3.1.0)
   Fix Version/s: (was: 3.1.0)
  3.2.0

Reset the fix version. Currently trunk is releasing for 3.2.0

> Log object instance get incorrectly in SlowDiskTracker
> --
>
> Key: HDFS-13130
> URL: https://issues.apache.org/jira/browse/HDFS-13130
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13130.patch
>
>
> In class org.apache.hadoop.hdfs.server.blockmanagement.*SlowDiskTracker*, the 
> LOG is targeted to *SlowPeerTracker*.class incorrectly.
> {code:java}
> public class SlowDiskTracker {
>  public static final Logger LOG =
>  LoggerFactory.getLogger(SlowPeerTracker.class);{code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-02-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365195#comment-16365195
 ] 

Tsz Wo Nicholas Sze edited comment on HDFS-13142 at 2/15/18 7:27 AM:
-

There are still 2 checkstyle warnings and many unit test failures.  Please take 
a look, [~shashikant].  Thanks.


was (Author: szetszwo):
There are still 2 checkstyle warnings and many unit tests failure.  Please take 
a look, [~shashikant].  Thanks.

> Define and Implement a DiifList Interface to store and manage SnapshotDiffs
> ---
>
> Key: HDFS-13142
> URL: https://issues.apache.org/jira/browse/HDFS-13142
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch
>
>
> The InodeDiffList class contains a generic List to store snapshotDiffs. The 
> generic List interface is bulky and to store and manage snapshotDiffs, we 
> need only a few specific methods. 
> This Jira proposes to define a new interface called DiffList interface which 
> will be used to store and manage snapshotDiffs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-02-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365195#comment-16365195
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13142:


There are still 2 checkstyle warnings and many unit tests failure.  Please take 
a look, [~shashikant].  Thanks.

> Define and Implement a DiifList Interface to store and manage SnapshotDiffs
> ---
>
> Key: HDFS-13142
> URL: https://issues.apache.org/jira/browse/HDFS-13142
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch
>
>
> The InodeDiffList class contains a generic List to store snapshotDiffs. The 
> generic List interface is bulky and to store and manage snapshotDiffs, we 
> need only a few specific methods. 
> This Jira proposes to define a new interface called DiffList interface which 
> will be used to store and manage snapshotDiffs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365193#comment-16365193
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13136:


Thanks for the update!

+1 on the 002 patch.

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-14 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365188#comment-16365188
 ] 

Vinayakumar B commented on HDFS-8693:
-

bq. Unlikely to be an issue though unless it's constantly refreshed for new NNs.
yes you are right. This might not be issue for current usecase of refreshNNs. 
Max Number of offer threads alive (created using refreshNN()) is same as number 
of NNs datanode can report to, which will be limited).

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-Addendum-branch-2.patch, 
> HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, 
> HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13008:
--
Attachment: HDFS-13008-HDFS-7240.002.patch

> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008-HDFS-7240.001.patch, 
> HDFS-13008-HDFS-7240.002.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages

2018-02-14 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365171#comment-16365171
 ] 

Brahma Reddy Battula commented on HDFS-9608:


Yes, It's nice to have.

> Disk IO imbalance in HDFS with heterogeneous storages
> -
>
> Key: HDFS-9608
> URL: https://issues.apache.org/jira/browse/HDFS-9608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: HDFS-9608.01.patch, HDFS-9608.02.patch, 
> HDFS-9608.03.patch, HDFS-9608.04.patch, HDFS-9608.05.patch, 
> HDFS-9608.06.patch, HDFS-9608.07.patch
>
>
> Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes 
> in HDFS with heterogeneous storages, this leads to non-RR choosing mode for 
> certain type of storage.
> Besides, it uses a shared lock for synchronization which limits the 
> concurrency of volume choosing process. Volume choosing threads that 
> operating on different storage types should be able to run concurrently. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2018-02-14 Thread Surendra Singh Lilhore (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365168#comment-16365168
 ] 

Surendra Singh Lilhore commented on HDFS-8277:
--

Hi,

[~arpitagarwal], [~vinayrpet] and [~brahmareddy]

We got the same issue in production environment. Can we finalized the solution 
for this Jira ?.

> Safemode enter fails when Standby NameNode is down
> --
>
> Key: HDFS-8277
> URL: https://issues.apache.org/jira/browse/HDFS-8277
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.6.0
> Environment: HDP 2.2.0
>Reporter: Hari Sekhon
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365166#comment-16365166
 ] 

Rakesh R commented on HDFS-13110:
-

Attached new patch, where I renamed the following classes as now it uses {{Type 
Parameters}} represents Id and path string.
- {{FileIdCollector.java => FileCollector.java}} and
- {{ExternalSPSFileIDCollector.java => ExternalSPSFilePathCollector.java}}

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, 
> HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch, 
> HDFS-13110-HDFS-10285-05.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-14 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13110:

Attachment: HDFS-13110-HDFS-10285-05.patch

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, 
> HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch, 
> HDFS-13110-HDFS-10285-05.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365155#comment-16365155
 ] 

genericqa commented on HDFS-13008:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
11s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
54s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project: The patch generated 6 new + 
1 unchanged - 0 fixed = 7 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
15s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 14s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  
org.apache.hadoop.ozone.container.common.helpers.ContainerData.getContainerID() 
is unsynchronized, 
org.apache.hadoop.ozone.container.common.helpers.ContainerData.setContainerID(Long)
 is synchronized  At ContainerData.java:synchronized  At 
ContainerData.java:[line 247] |
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.ozone.web.client.TestKeysRatis |
|   | 

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365131#comment-16365131
 ] 

genericqa commented on HDFS-12051:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 4 new + 1234 unchanged - 19 fixed = 1238 total (was 1253) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
52s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}135m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}184m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Increment of volatile field 
org.apache.hadoop.hdfs.server.namenode.NameCache.size in 
org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[])  At 
NameCache.java:in org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[])  
At NameCache.java:[line 125] |
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12051 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910663/HDFS-12051.12.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux a62106be8c4b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build 

[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365105#comment-16365105
 ] 

He Xiaoqiao commented on HDFS-12749:


[~kihwal],[~xkrogen] Thanks for your comments.
{quote}the problem is that the NN correctly processed the registration, but the 
DN timed out before receiving the response. Since from NN point of view the 
registration was complete, it did not send another DNA_REGISTER command.{quote}
thanks for your additional comments [~xkrogen].
Since NN considers DN register successfully but DN timeout before receiving 
response, and DN would not get DNA_REGISTER command from NN at next heartbeat 
intervel. If as except, DN should catch exception in 
{{BPServiceActor#register}} and retry until register successfully as catching 
{{SocketTimeoutException}} or {{EOFException}} currently. 

One question is why underlying RPC not throw {{SocketTimeoutException}} but 
{{IOException}}, After trace invoke stack, I find {{NetUtils#wrapException}} 
who invoke by {{Client#call}} line 1474 may be the main reason.
{code:java}
  if (call.error != null) {
if (call.error instanceof RemoteException) {
  call.error.fillInStackTrace();
  throw call.error;
} else { // local exception
  InetSocketAddress address = connection.getRemoteAddress();
  throw NetUtils.wrapException(address.getHostName(),
  address.getPort(),
  NetUtils.getHostname(),
  0,
  call.error);
}
{code}
but I am confused why {{call.error}} set IOException and throw upper lead to 
{{BPServiceActor#register}} not catch it. 
After reviewed HDFS-8995 I think it could not fix this issue.

Anyway, I agree to fix ti with catching IOException in 
{{BPServiceActor#register}} and retry until register successfully.
[~kihwal],[~xkrogen] do you mind having a look?

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* 

[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365087#comment-16365087
 ] 

Wangda Tan commented on HDFS-12452:
---

[~xyao], if this patch can be done by this week, please commit to branch-3.1 as 
well. Thanks.

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13008:
--
Attachment: HDFS-13008-HDFS-7240.001.patch

> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008-HDFS-7240.001.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13008:
--
Attachment: (was: HDFS-13008.001.patch)

> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008-HDFS-7240.001.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365050#comment-16365050
 ] 

genericqa commented on HDFS-13008:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-13008 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13008 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910670/HDFS-13008.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23075/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008.001.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13008:
--
Status: Patch Available  (was: Open)

> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008.001.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13008:
--
Attachment: HDFS-13008.001.patch

> Ozone: Add DN container open/close state to container report
> 
>
> Key: HDFS-13008
> URL: https://issues.apache.org/jira/browse/HDFS-13008
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13008.001.patch
>
>
> HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This 
> ticket is opened to add the DN container close state to container report so 
> that SCM container state manager can update state from closing to closed when 
> DN side container is fully closed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365041#comment-16365041
 ] 

genericqa commented on HDFS-13081:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 12s{color} | {color:orange} root: The patch generated 5 new + 163 unchanged 
- 2 fixed = 168 total (was 165) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
21s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 16s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}196m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13081 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910640/HDFS-13081.001.patch |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e725a7925929 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8f66aff |
| maven | version: Apache 

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-14 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365020#comment-16365020
 ] 

Misha Dmitriev commented on HDFS-12051:
---

[~atm] I've just submitted a patch where I've addressed your comments. I've 
added functionality to completely disable NameCache by specifying 
DFS_NAMENODE_NAME_CACHE_SIZE_RATIO_KEY = 0.0. I've added a test for this, plus 
a stress-test where the cache is exercised by multiple threads and the number 
of unique names exceeds the cache's capacity (this may happen in production). 
As we discussed, so far I cannot find a good way to pass around a 
"non-singleton" NameCache instance around all the code that needs it. On the 
other hand, I explained that I don't see problems if this singleton is used by 
multiple NameNode instances running within the same JVM.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch, HDFS-12051.12.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-14 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Release Note: Addressed the @atm's comments
  Status: Patch Available  (was: In Progress)

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch, HDFS-12051.12.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> 

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-14 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Attachment: HDFS-12051.12.patch

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch, HDFS-12051.12.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> 

[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-14 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HDFS-12051:
--
Status: In Progress  (was: Patch Available)

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch, HDFS-12051.12.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... and 45902395 more arrays, of which 13158084 are unique
>  <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name 
> <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode 
> <--  {j.u.ArrayList} <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs 
> <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 
> elements) ... <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java 
> Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER
>   409,830K (0.8%), 13482787 dup arrays (13260241 unique)
> 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of 
> byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...)
> ... and 13479257 more arrays, of which 13260231 are unique
>  <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- 
> org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0
>  <-- j.l.Thread[] <-- 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- 

[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365007#comment-16365007
 ] 

Anu Engineer commented on HDFS-13134:
-

[~ljain] meanwhile you might want to look at the checkstyl, find bugs and unit 
test failures. Thanks

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134-HDFS-7240.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364777#comment-16364777
 ] 

Anu Engineer edited comment on HDFS-13134 at 2/15/18 12:34 AM:
---

[~ljain] The change looks good, I am going to take a week to think through this 
and review with the rest of the team before I commit this. The reason is that 
command provides the ability to destroy data ( and rightfully so)
 [~msingh], [~nandakumar131],[~elek] and [~xyao] Please take a look at the 
patch when you have a chance. I want to commit this only after I get 
perspectives from others too. Thanks for your time and consideration.


was (Author: anu):
[~ljain] The change looks good, I am going to take a week to think through this 
and review with the rest of the team before I commit this. The reason is that 
command provides the ability to destroy data ( and rightfully so)
[~msingh], [~nandakumar131],[~elek] and [~xyao] Please take a look at the patch 
when you have a chance. I want to commit this only I get perspectives from 
others too. Thanks for your time and consideration.

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134-HDFS-7240.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365005#comment-16365005
 ] 

genericqa commented on HDFS-13134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
41s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 10 new + 2 unchanged - 0 fixed = 12 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
28s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}147m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
27s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}203m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Possible null pointer dereference in 
org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(Configuration,
 List, DatanodeID) due to return value of called method  Dereferenced at 
ContainerManagerImpl.java:org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(Configuration,
 List, DatanodeID) due to return value of called method  Dereferenced at 
ContainerManagerImpl.java:[line 183] |
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.ozone.container.common.TestEndPoint |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.ozone.web.client.TestKeys |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.cblock.TestBufferManager |
|   | hadoop.ozone.web.client.TestKeysRatis |
|   | hadoop.cblock.TestCBlockReadWrite |
|   | hadoop.ozone.container.common.TestDatanodeStateMachine |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce 

[jira] [Commented] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364943#comment-16364943
 ] 

genericqa commented on HDFS-11699:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
38s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
47s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 15s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-11699 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910624/HDFS-11699-HDFS-7240.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c3e0c85336a6 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-7240 / f3d07ef |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23071/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23071/testReport/ |
| Max. process+thread count | 3982 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23071/console |
| Powered by | Apache Yetus 

[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364922#comment-16364922
 ] 

genericqa commented on HDFS-12452:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 18 unchanged - 0 fixed = 19 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}142m 17s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}201m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12452 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910620/HDFS-12452.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eeb750a6171a 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1f20f43 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23070/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23070/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23070/testReport/ |
| Max. 

[jira] [Commented] (HDFS-13149) Ozone: Rename Corona to Freon

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364911#comment-16364911
 ] 

Anu Engineer commented on HDFS-13149:
-

[~msingh], [~elek] This patch may break internal stress and perf runs of ozone. 
Hence flagging it for your consideration, Please review when you get a chance. 
[~nandakumar131] Please take a look at this patch when you get a chance.

> Ozone: Rename Corona to Freon
> -
>
> Key: HDFS-13149
> URL: https://issues.apache.org/jira/browse/HDFS-13149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Trivial
> Attachments: HDFS-13149-HDFS-7240.001.patch
>
>
> While reviewing Ozone  [~jghoman] and in the a comment  in HDFS-12992, 
> [~chris.douglas]
> both pointed out the Corona is a name used by a YARN project from Facebook.
> This Jira proposes to rename Corona(a chemical process that produces Ozone) 
> to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for 
> coming up with both names.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13149) Ozone: Rename Corona to Freon

2018-02-14 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-13149:

Attachment: HDFS-13149-HDFS-7240.001.patch

> Ozone: Rename Corona to Freon
> -
>
> Key: HDFS-13149
> URL: https://issues.apache.org/jira/browse/HDFS-13149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Trivial
> Attachments: HDFS-13149-HDFS-7240.001.patch
>
>
> While reviewing Ozone  [~jghoman] and in the a comment  in HDFS-12992, 
> [~chris.douglas]
> both pointed out the Corona is a name used by a YARN project from Facebook.
> This Jira proposes to rename Corona(a chemical process that produces Ozone) 
> to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for 
> coming up with both names.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-14 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364893#comment-16364893
 ] 

Daryn Sharp commented on HDFS-13040:


So many points to cover.  You're on the right path.

*SPNEGO*

"JDK performed authentication on our behalf". The JDK will always transparently 
do spnego by internally handling the 401 and reissuing the request with a TGS.  
The client never sees a 401.  It may see 403 if authentication failed.  If a 
client does see 401, spnego wasn't possible because of no tgt, no tgs 
available, etc.  The KerberosAuthenticator is going to try spnego again, and 
virtually guaranteed to fail for the same reason.  Pretty much what you are 
seeing.

*Sample Code: TransactionReader*

I cringe when I see "secure" code like this because it makes people think 
security is too hard.  All of the ugi conf/relogin/doas is unnecessary.  
Seriously, just rip it out.  Pretend like you are writing what you think is 
insecure code.  Enable security in core-site, kinit before you run it...  
That's it.  Or if you prefer, you can leave in just this one line:

UserGroupInformation.loginUserFromKeytab(princ, keytab);

*Issues*
{quote}At NameNode startup, the NameNode acquires a tgt, and it is saved in its 
ticket cache
{quote}
I suspected this might be the case.  Please don't start your NN from a ticket 
cache, let alone share that ticket cache with user tools.  Set the confs for 
keytab and principal.  Forget all the expired ticket cache stuff.  It's a 
distraction and self-inflicted pain.

Let's remove some confusion and describe your NN as running as 
"hdfs/nn1@REALM", and you are using "superuser@REALM" to run inotify.  I know 
you are using hdfs but it'll be clear soon.

As a client, you have credentials for "superuser@REALM".  After authentication 
to the NN's RPC server it creates a "superuser" ugi, but it has no credentials, 
just an identity.  The inotify method calls an edit log method that makes an 
external rest call which needs credentials.  But "superuser" has no creds, 
boom, big ugly gssapi stack trace.

With the proposed patch, an explict doAs the login user flips you back to 
"hdfs/nn1@REALM" which does have credentials.  It's what you want, but not 
correctly implemented.

The root issue is a client must have an immutable identity determined when 
created.  The URLLog ctor should save off the current user and use that for the 
doAs.  Since the login user creates the edit log, it means any use of it, 
regardless of whether from rpc, will retain that identity.

 

 

 

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.03.patch, HDFS-13040.half.test.patch, 
> TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 

[jira] [Updated] (HDFS-13149) Ozone: Rename Corona to Freon

2018-02-14 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-13149:

Description: 
While reviewing Ozone  [~jghoman] and in the a comment  in HDFS-12992, 
[~chris.douglas]

both pointed out the Corona is a name used by a YARN project from Facebook.

This Jira proposes to rename Corona(a chemical process that produces Ozone) to 
Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for coming 
up with both names.

 

  was:
While reviewing Ozone  [~jghoman] and in the a comment  in HDFS-12992 
[~chris.douglas]

both pointed out the Corona is a name used by a YARN project from Facebook.

This Jira proposes to rename Corona(a chemical process that produces Ozone) to 
Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for coming 
up with both names.

 


> Ozone: Rename Corona to Freon
> -
>
> Key: HDFS-13149
> URL: https://issues.apache.org/jira/browse/HDFS-13149
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Trivial
>
> While reviewing Ozone  [~jghoman] and in the a comment  in HDFS-12992, 
> [~chris.douglas]
> both pointed out the Corona is a name used by a YARN project from Facebook.
> This Jira proposes to rename Corona(a chemical process that produces Ozone) 
> to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for 
> coming up with both names.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13149) Ozone: Rename Corona to Freon

2018-02-14 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-13149:
---

 Summary: Ozone: Rename Corona to Freon
 Key: HDFS-13149
 URL: https://issues.apache.org/jira/browse/HDFS-13149
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Anu Engineer
Assignee: Anu Engineer


While reviewing Ozone  [~jghoman] and in the a comment  in HDFS-12992 
[~chris.douglas]

both pointed out the Corona is a name used by a YARN project from Facebook.

This Jira proposes to rename Corona(a chemical process that produces Ozone) to 
Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for coming 
up with both names.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption

2018-02-14 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364862#comment-16364862
 ] 

Ajay Kumar commented on HDFS-13081:
---

[~daryn] thanks for the valuable input. Updated the patch to allow DN to start 
in case SASL is enabled and HTTP port is privileged.

cc: [~jnp],[~xyao]

> Datanode#checkSecureConfig should check HTTPS and SASL encryption
> -
>
> Key: HDFS-13081
> URL: https://issues.apache.org/jira/browse/HDFS-13081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 3.0.0
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch
>
>
> Datanode#checkSecureConfig currently check the following to determine if 
> secure datanode is enabled. 
>  # The server has bound to privileged ports for RPC and HTTP via 
> SecureDataNodeStarter.
>  # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain 
> HTTP) for the HTTP server. The SASL handshake guarantees authentication of 
> the RPC server before a client transmits a secret, such as a block access 
> token. Similarly, SSL guarantees authentication of the
>  HTTP server before a client transmits a secret, such as a delegation token.
> For the 2nd case, HTTPS_ONLY means all the traffic between REST client/server 
> will be encrypted. However, the logic to check only if SASL property resolver 
> is configured does not mean server requires an encrypted RPC. 
> This ticket is open to further check and ensure datanode SASL property 
> resolver has a QoP that includes auth-conf(PRIVACY). Note that the SASL QoP 
> (Quality of Protection) negotiation may drop RPC protection level from 
> auth-conf(PRIVACY) to auth-int(integrity) or auth(authentication) only, which 
> should be fine by design.
>  
> cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption

2018-02-14 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13081:
--
Attachment: HDFS-13081.001.patch

> Datanode#checkSecureConfig should check HTTPS and SASL encryption
> -
>
> Key: HDFS-13081
> URL: https://issues.apache.org/jira/browse/HDFS-13081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, security
>Affects Versions: 3.0.0
>Reporter: Xiaoyu Yao
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch
>
>
> Datanode#checkSecureConfig currently check the following to determine if 
> secure datanode is enabled. 
>  # The server has bound to privileged ports for RPC and HTTP via 
> SecureDataNodeStarter.
>  # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain 
> HTTP) for the HTTP server. The SASL handshake guarantees authentication of 
> the RPC server before a client transmits a secret, such as a block access 
> token. Similarly, SSL guarantees authentication of the
>  HTTP server before a client transmits a secret, such as a delegation token.
> For the 2nd case, HTTPS_ONLY means all the traffic between REST client/server 
> will be encrypted. However, the logic to check only if SASL property resolver 
> is configured does not mean server requires an encrypted RPC. 
> This ticket is open to further check and ensure datanode SASL property 
> resolver has a QoP that includes auth-conf(PRIVACY). Note that the SASL QoP 
> (Quality of Protection) negotiation may drop RPC protection level from 
> auth-conf(PRIVACY) to auth-int(integrity) or auth(authentication) only, which 
> should be fine by design.
>  
> cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364804#comment-16364804
 ] 

genericqa commented on HDFS-12452:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 18 unchanged - 0 fixed = 19 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 54s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData |
|   | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure020 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12452 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910616/HDFS-12452.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7b5cf1dfe02b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1f20f43 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23069/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364777#comment-16364777
 ] 

Anu Engineer commented on HDFS-13134:
-

[~ljain] The change looks good, I am going to take a week to think through this 
and review with the rest of the team before I commit this. The reason is that 
command provides the ability to destroy data ( and rightfully so)
[~msingh], [~nandakumar131],[~elek] and [~xyao] Please take a look at the patch 
when you have a chance. I want to commit this only I get perspectives from 
others too. Thanks for your time and consideration.

> Ozone: Format open containers on datanode restart
> -
>
> Key: HDFS-13134
> URL: https://issues.apache.org/jira/browse/HDFS-13134
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13134-HDFS-7240.001.patch
>
>
> Once a datanode is restarted its open containers should be formatted. Only 
> the open containers whose pipeline has a replication factor of three will 
> need to be formatted. The format command is sent by SCM to the datanode after 
> the corresponding containers have been successfully replicated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12983) Block Storage: provide docker-compose file for cblock clusters

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364773#comment-16364773
 ] 

Anu Engineer commented on HDFS-12983:
-

[~elek] I am +1 on this patch. However, we have 50070, just wondering if the 
port numbers are correct since this Hadoop 3.0 default ports might be 
different. Can you please take a quick look at the port numbers? If you are 
sure they are functional please let me know and I will commit this patch.

> Block Storage: provide docker-compose file for cblock clusters
> --
>
> Key: HDFS-12983
> URL: https://issues.apache.org/jira/browse/HDFS-12983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: ozone
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-12983-HDFS-7240.001.patch, 
> HDFS-12983-HDFS-7240.002.patch, HDFS-12983-HDFS-7240.003.patch
>
>
> Since HDFS-12469 we have a docker compose file at dev-support/compose/ozone 
> which makes it easy to start local ozone clusers with multiple datanodes.
> In this patch I propose similar config file for the cblock/iscsi servers 
> (jscsi + cblock + scm + namenode + datanode) to make it easier to check the 
> latest state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-14 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364764#comment-16364764
 ] 

Daryn Sharp commented on HDFS-13040:


Let me catch up.

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.03.patch, HDFS-13040.half.test.patch, 
> TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> Typically if NameNode has an expired Kerberos ticket, the error handling for 
> the typical edit log tailing would let NameNode to relogin with its own 
> Kerberos principal. However, when inotify uses the same code path to retrieve 
> edits, since the current user is the inotify client's principal, unless 
> client uses the same principal as the NameNode, NameNode can't do it on 
> behalf of the client.
> Therefore, a more appropriate approach is to use proxy user so that NameNode 
> can retrieving edits on behalf of the client.
> I will attach a patch to fix it. This patch has been verified to work for a 
> CDH5.10.2 cluster, however it seems impossible to craft a unit test for this 
> fix because the way Hadoop UGI handles Kerberos credentials (I can't have a 
> single process that logins as two Kerberos principals simultaneously and let 
> them establish connection)
> A possible workaround is for the inotify client to use the active NameNode's 
> server principal. However, that's not going to work when there's a namenode 
> failover, because then the client's principal will not be consistent with the 
> 

[jira] [Commented] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364733#comment-16364733
 ] 

Anu Engineer commented on HDFS-11699:
-

[~elek] Thanks for the code review comments. The patch v2 addresses all 
comments and fixes the checkStyle issues too.

Details below:
bq.  In ContainerMapping.java/processContainerReport: I don't undestand the 
comments, but my impression is that the two method should be swaped:
You are absolutely right; thanks for catching that.

bq. According to my understanding this code will send a close command even if 
the container is in CLOSED state. IMHO it should be sent only if the container 
in OPEN or CLOSING state.

Fixed.

bq.  It's not clear for me how the CLOSED state will be achieved, but maybe 
it's a task of a different jira.
Correct, the client will post that message. I think we have a JIRA in progress 
for that already.

bq. javadoc of ContainerMapping.shouldClose is misleading. It returns false if 
the container is closed
Fixed.

> Ozone:SCM: Add support for close containers in SCM
> --
>
> Key: HDFS-11699
> URL: https://issues.apache.org/jira/browse/HDFS-11699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
> Attachments: HDFS-11699-HDFS-7240.001.patch, 
> HDFS-11699-HDFS-7240.002.patch
>
>
> Add support for closed containers in SCM. When a container is closed, SCM 
> needs to make a set of decisions like which pool and which machines are 
> expected to have this container. SCM also needs to issue a copyContainer 
> command to the target datanodes so that these nodes can replicate data from 
> the original locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM

2018-02-14 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-11699:

Attachment: HDFS-11699-HDFS-7240.002.patch

> Ozone:SCM: Add support for close containers in SCM
> --
>
> Key: HDFS-11699
> URL: https://issues.apache.org/jira/browse/HDFS-11699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
> Attachments: HDFS-11699-HDFS-7240.001.patch, 
> HDFS-11699-HDFS-7240.002.patch
>
>
> Add support for closed containers in SCM. When a container is closed, SCM 
> needs to make a set of decisions like which pool and which machines are 
> expected to have this container. SCM also needs to issue a copyContainer 
> command to the target datanodes so that these nodes can replicate data from 
> the original locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364731#comment-16364731
 ] 

Kihwal Lee commented on HDFS-12749:
---

I see. The old registration looks identical to the new one, so NN still accepts 
it.  About catching IOException: if the registration fails with a 
RemoteException that is not RetriableException, the actor may need to stop 
instead of retrying. Also, if we choose to blank something out before trying to 
re-register to re-trigger registration, we should avoid hitting something like 
HDFS-8995.

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
> sleepAndLogInterrupts(1000, "connecting to server");
>   } catch(SocketTimeoutException e) {  // namenode is busy
> LOG.info("Problem connecting to server: " + nnAddr);
> sleepAndLogInterrupts(1000, "connecting to server");
>   }
> }
> 
> LOG.info("Block pool " + this + " successfully registered with NN");
> bpos.registrationSucceeded(this, bpRegistration);
> // random short delay 

[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364718#comment-16364718
 ] 

genericqa commented on HDFS-13136:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 285 unchanged - 1 fixed = 285 total (was 286) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 14s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.TestHDFSFileSystemContract |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13136 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910611/HDFS-13136.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 76750bd05515 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f20dc0d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23068/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23068/testReport/ |
| Max. process+thread 

[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path

2018-02-14 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364719#comment-16364719
 ] 

Hanisha Koneru commented on HDFS-13114:
---

Thanks for the review, [~jojochuang].

I have created HDFS-13148 to add unit tests for EZ with Federation. Once we 
have that setup, it would be easy to add a test for this change. I will get 
back to this once HDFS-13148 is resolved.

> CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
> 
>
> Key: HDFS-13114
> URL: https://issues.apache.org/jira/browse/HDFS-13114
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13114.001.patch
>
>
> The {{crypto -reencryptZone  -path }} command takes in a path 
> argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs 
> instead of resolving from the path. This causes the following exception if 
> the authority component in path does not match the authority of default Fs.
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1
> IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, 
> expected: hdfs://ns1{code}
> {code:java}
> $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2
> IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: 
> hdfs://ns1{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13148) Unit test for KMS and EZ with Federation

2018-02-14 Thread Hanisha Koneru (JIRA)
Hanisha Koneru created HDFS-13148:
-

 Summary: Unit test for KMS and EZ with Federation
 Key: HDFS-13148
 URL: https://issues.apache.org/jira/browse/HDFS-13148
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru


It would be good to have some unit tests for testing KMS and EZ on a federated 
cluster. We can start with basic EZ operations. For example, create EZs on two 
namespaces with different keys using one KMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364595#comment-16364595
 ] 

Gabor Bota commented on HDFS-12452:
---

Thanks for the patch Xiaoyu! I just found when going through the patch that you 
have a typo in 
INTERNAL_DFS_BLOCK_SCANNER_SHUTDOWN_WAIT_INTERVAL="internal.dfs.block.scanner.shutdown.wait.intervaL"
I think the string should be internal.dfs.block.scanner.shutdown.wait.interval 
with lowercase l.

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364677#comment-16364677
 ] 

Xiaoyu Yao commented on HDFS-12452:
---

good catch. Upload v2 patch that fixed the typo.

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-12452:
--
Attachment: HDFS-12452.002.patch

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads

2018-02-14 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364636#comment-16364636
 ] 

Erik Krogen commented on HDFS-12345:


Just want to ping watchers that the description has been updated now that 
Dynamometer is fully open source. We are interested in hearing feedback about 
the possibility of including Dynamometer into Hadoop itself, i.e. as part of 
tools.

> Scale testing HDFS NameNode with real metadata and workloads
> 
>
> Key: HDFS-12345
> URL: https://issues.apache.org/jira/browse/HDFS-12345
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, test
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>Priority: Major
>
> Dynamometer has now been open sourced on our [GitHub 
> page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog 
> post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum].
> To encourage getting the tool into the open for others to use as quickly as 
> possible, we went through our standard open sourcing process of releasing on 
> GitHub. However we are interested in the possibility of donating this to 
> Apache as part of Hadoop itself and would appreciate feedback on whether or 
> not this is something that would be supported by the community.
> Also of note, previous [discussions on the dev mail 
> lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads

2018-02-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364655#comment-16364655
 ] 

Íñigo Goiri commented on HDFS-12345:


I'm personally interested on pushing HDFS as a MapReduce job into Hadoop itself.
I've gone through that part of the code extensively and I'd like a couple 
things tweaked here and there but I like the setup.

We already have tools like GridMix so I think it makes sense to add the 
workload replayer too.
I internally have a couple MapReduce jobs that do something pretty similar so I 
could also converge into this.

To summarize, my opinion is that we should go over on how to split the work but 
I think this most of Dynanometer (if not all) should go into Hadoop.

> Scale testing HDFS NameNode with real metadata and workloads
> 
>
> Key: HDFS-12345
> URL: https://issues.apache.org/jira/browse/HDFS-12345
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, test
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>Priority: Major
>
> Dynamometer has now been open sourced on our [GitHub 
> page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog 
> post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum].
> To encourage getting the tool into the open for others to use as quickly as 
> possible, we went through our standard open sourcing process of releasing on 
> GitHub. However we are interested in the possibility of donating this to 
> Apache as part of Hadoop itself and would appreciate feedback on whether or 
> not this is something that would be supported by the community.
> Also of note, previous [discussions on the dev mail 
> lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads

2018-02-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364655#comment-16364655
 ] 

Íñigo Goiri edited comment on HDFS-12345 at 2/14/18 7:32 PM:
-

I'm personally interested on pushing HDFS as a YARN job into Hadoop itself.
I've gone through that part of the code extensively and I'd like a couple 
things tweaked here and there but I like the setup.

We already have tools like GridMix so I think it makes sense to add the 
workload replayer too.
I internally have a couple MapReduce jobs that do something pretty similar so I 
could also converge into this.

To summarize, my opinion is that we should go over on how to split the work but 
I think this most of Dynanometer (if not all) should go into Hadoop.


was (Author: elgoiri):
I'm personally interested on pushing HDFS as a MapReduce job into Hadoop itself.
I've gone through that part of the code extensively and I'd like a couple 
things tweaked here and there but I like the setup.

We already have tools like GridMix so I think it makes sense to add the 
workload replayer too.
I internally have a couple MapReduce jobs that do something pretty similar so I 
could also converge into this.

To summarize, my opinion is that we should go over on how to split the work but 
I think this most of Dynanometer (if not all) should go into Hadoop.

> Scale testing HDFS NameNode with real metadata and workloads
> 
>
> Key: HDFS-12345
> URL: https://issues.apache.org/jira/browse/HDFS-12345
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, test
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>Priority: Major
>
> Dynamometer has now been open sourced on our [GitHub 
> page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog 
> post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum].
> To encourage getting the tool into the open for others to use as quickly as 
> possible, we went through our standard open sourcing process of releasing on 
> GitHub. However we are interested in the possibility of donating this to 
> Apache as part of Hadoop itself and would appreciate feedback on whether or 
> not this is something that would be supported by the community.
> Also of note, previous [discussions on the dev mail 
> lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads

2018-02-14 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364651#comment-16364651
 ] 

Anu Engineer commented on HDFS-12345:
-

[~xkrogen] I have looked at the source and read the excellent blog that you 
wrote. Thank you. I am very impressed by this tool and I am +1 for making this 
tool to be part of Hadoop. The most important reason is that we will be able to 
make sure that this tool stays current with the code base.

It might lead to adding some simple "sanity check" type of unit tests though, 
so that we can detect it we have accidently broken this tool.

> Scale testing HDFS NameNode with real metadata and workloads
> 
>
> Key: HDFS-12345
> URL: https://issues.apache.org/jira/browse/HDFS-12345
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, test
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>Priority: Major
>
> Dynamometer has now been open sourced on our [GitHub 
> page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog 
> post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum].
> To encourage getting the tool into the open for others to use as quickly as 
> possible, we went through our standard open sourcing process of releasing on 
> GitHub. However we are interested in the possibility of donating this to 
> Apache as part of Hadoop itself and would appreciate feedback on whether or 
> not this is something that would be supported by the community.
> Also of note, previous [discussions on the dev mail 
> lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-14 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Open  (was: Patch Available)

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN

2018-02-14 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364637#comment-16364637
 ] 

Daryn Sharp commented on HDFS-8693:
---

Caveat is I believe this will cause a small memory leak of the rpc call's 
subject.  When you invoke doAs, it essentially pushes a new context onto the 
access control stack.  After this patch, the stack is likely: loginUserSubject, 
rpcCallSubject, anotherLoginUserSubject.  If refreshing the NNs creates a new 
offer thread it inherits the access control context so the rpc subject will 
leak until the thread exits.

Unlikely to be an issue though unless it's constantly refreshed for new NNs.

> refreshNamenodes does not support adding a new standby to a running DN
> --
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.6.0
>Reporter: Jian Fang
>Assignee: Ajith S
>Priority: Critical
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-8693-03-Addendum-branch-2.patch, 
> HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, 
> HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA 
> support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new 
> one so that I don't need to restart the data nodes. However, I got the 
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a 
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code 
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList addrs) throws IOException {
> Set oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. 
> throw new IOException( "HA does not currently support adding a new standby to 
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list 
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto 
> provisioning a hadoop cluster with HDFS HA support. Without this support, the 
> HA feature could not really be used. I also observed that the new standby 
> name node on the replacement instance could stuck in safe mode because no 
> data nodes check in with it. Even with a rolling restart, it may take quite 
> some time to restart all data nodes if we have a big cluster, for example, 
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is 
> not a preferable operation in production. It also increases the chance for a 
> double failure because the standby name node is not really ready for a 
> failover in the case that the current active name node fails. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted

2018-02-14 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364626#comment-16364626
 ] 

Arpit Agarwal commented on HDFS-12985:
--

Hi [~manojg], I was looking at the test case to understand the problem better. 
The test passes for me even without the fix in INodeFile (I ran the new test 
against git hash 2ee0d64aceed876f57f09eb9efe1872b6de98d2e).

Do you see the test fail without the fix?

> NameNode crashes during restart after an OpenForWrite file present in the 
> Snapshot got deleted
> --
>
> Key: HDFS-12985
> URL: https://issues.apache.org/jira/browse/HDFS-12985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.0
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-12985.01.patch
>
>
> NameNode crashes repeatedly with NPE at the startup when trying to find the 
> total number of under construction blocks. This crash happens after an open 
> file, which was also part of a snapshot gets deleted along with the snapshot.
> {noformat}
> Failed to start namenode.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-12452:
--
Status: Patch Available  (was: Open)

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364595#comment-16364595
 ] 

Gabor Bota edited comment on HDFS-12452 at 2/14/18 6:48 PM:


Thanks for the patch Xiaoyu! I just found when going through the patch that you 
have a typo when defining INTERNAL_DFS_BLOCK_SCANNER_SHUTDOWN_WAIT_INTERVAL = 
"internal.dfs.block.scanner.shutdown.wait.intervaL"
I think the string should be internal.dfs.block.scanner.shutdown.wait.interval 
with lowercase l.


was (Author: gabor.bota):
Thanks for the patch Xiaoyu! I just found when going through the patch that you 
have a typo in 
INTERNAL_DFS_BLOCK_SCANNER_SHUTDOWN_WAIT_INTERVAL="internal.dfs.block.scanner.shutdown.wait.intervaL"
I think the string should be internal.dfs.block.scanner.shutdown.wait.interval 
with lowercase l.

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-12452:
--
Attachment: HDFS-12452.001.patch

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
> Attachments: HDFS-12452.001.patch
>
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13147) Support -c argument for DFS command head and tail

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364530#comment-16364530
 ] 

genericqa commented on HDFS-13147:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green} root: The patch generated 0 new + 175 unchanged - 2 
fixed = 175 total (was 177) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 35s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 50s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}129m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}219m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.log.TestLogLevel |
|   | hadoop.http.TestHttpServerWithSpengo |
|   | hadoop.security.token.delegation.web.TestWebDelegationToken |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13147 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910586/HDFS-13147.003.patch |
| Optional Tests |  asflicense  compile  javac 

[jira] [Assigned] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-12452:
-

Assignee: Xiaoyu Yao

> TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
> --
>
> Key: HDFS-12452
> URL: https://issues.apache.org/jira/browse/HDFS-12452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
>Priority: Critical
>  Labels: flaky-test
>
> TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails 
> frequently in Jenkins runs but it passes locally on my dev machine.
> e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/
> {code}
> Error Message
> test timed out after 12 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364520#comment-16364520
 ] 

Erik Krogen edited comment on HDFS-12749 at 2/14/18 6:00 PM:
-

[~kihwal], IIUC, the problem is that the NN correctly processed the 
registration, but the DN timed out before receiving the response. Since from NN 
point of view the registration was complete, it did not send another 
DNA_REGISTER command. 


was (Author: xkrogen):
[~kihwal], IIUC, the problem is that the NN correctly processed the 
registration, but the DN got an IOException during processing. Since from NN 
point of view the registration was complete, it did not send another 
DNA_REGISTER command. 

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
> sleepAndLogInterrupts(1000, "connecting to server");
>   } catch(SocketTimeoutException e) {  // namenode is busy
> LOG.info("Problem connecting to server: " + nnAddr);
> sleepAndLogInterrupts(1000, "connecting to server");
>   }
> }
>  

[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364520#comment-16364520
 ] 

Erik Krogen commented on HDFS-12749:


[~kihwal], IIUC, the problem is that the NN correctly processed the 
registration, but the DN got an IOException during processing. Since from NN 
point of view the registration was complete, it did not send another 
DNA_REGISTER command. 

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
> sleepAndLogInterrupts(1000, "connecting to server");
>   } catch(SocketTimeoutException e) {  // namenode is busy
> LOG.info("Problem connecting to server: " + nnAddr);
> sleepAndLogInterrupts(1000, "connecting to server");
>   }
> }
> 
> LOG.info("Block pool " + this + " successfully registered with NN");
> bpos.registrationSucceeded(this, bpRegistration);
> // random short delay - helps scatter the BR from all DNs
> scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay);
>   }
> {code}
> But NameNode has processed 

[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364508#comment-16364508
 ] 

Kihwal Lee commented on HDFS-12749:
---

What was the command that failed when you saw "Error processing datanode 
Command"?  If the processing of DNA_REGISTER blew up with an IOException, the 
DN would have gotten the command again in the next hearbeat and retried. The 
block token secret is updated (DNA_ACCESSKEYUPDATE)  in the first heartbeat 
after registration. There can be other commands along with it, but failure of 
processing a command does not abort the whole processing, so that shouldn't 
matter.  We need to understand where exactly it went wrong.  More detailed 
exception/stack traces, thread name (so that we can tell which actor thread it 
came from), timing, etc. will be helpful.



 

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
> sleepAndLogInterrupts(1000, "connecting to server");
>   } catch(SocketTimeoutException e) {  // namenode is busy
> LOG.info("Problem connecting 

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364497#comment-16364497
 ] 

Íñigo Goiri commented on HDFS-13119:


[~linyiqun], yep the retry logic kind of break the flow.
Anyway, I think we should try to refactor that part of code and avoid repeating 
this:
{code:java}
if (this.rpcMonitor != null) {
  this.rpcMonitor.proxyOpRetries();
}
return invoke(nsId, ++retryCount, method, obj, params);
{code}
It's minor but I think that we should try to make an effort to keep this 
function as easy to read as possible.
What about extending {{shouldRetry()}} and check for unavailable there?
We already use the FAIL case there but maybe we can just throw the exception 
there.


> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-14 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364469#comment-16364469
 ] 

Xiaoyu Yao edited comment on HDFS-13136 at 2/14/18 5:29 PM:


Thanks [~szetszwo] for the review. Update patch v2 that fixed the unit test 
failures in

{code}
hadoop.hdfs.server.namenode.TestAuditLogger and 
hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands
{code}

Now that the getPermissionChecker() is moved out of the FSN lock, the test 
mocks are updated to reach deeper to get the expected exception and the audit 
log entry. The delta from v1 to v2 is the two unit test changes above.  The 
other two failures cannot repro. 



was (Author: xyao):
Thanks [~szetszwo] for the review. Update patch v2 that fixed the unit test 
failures in

hadoop.hdfs.server.namenode.TestAuditLogger and 
hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands

Now that the getPermission checker is moved out of the FSN lock, the test mocks 
are updated to reach deeper to get the expected exception and the audit log 
entry. 

The other two failures cannot repro. 


> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-14 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364469#comment-16364469
 ] 

Xiaoyu Yao commented on HDFS-13136:
---

Thanks [~szetszwo] for the review. Update patch v2 that fixed the unit test 
failures in

hadoop.hdfs.server.namenode.TestAuditLogger and 
hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands

Now that the getPermission checker is moved out of the FSN lock, the test mocks 
are updated to reach deeper to get the expected exception and the audit log 
entry. 

The other two failures cannot repro. 


> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-14 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-13136:
--
Attachment: HDFS-13136.002.patch

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364356#comment-16364356
 ] 

Erik Krogen edited comment on HDFS-12749 at 2/14/18 4:38 PM:
-

Hey [~hexiaoqiao], I have to admit I'm not familiar with this portion of the 
codebase. But it seems that the underlying issue is still a 
{{SocketTimeoutException}}, which we are trying to catch. Is this just an issue 
of an exception being over-wrapped? If it is over-wrapped in this case, is 
there any case where it won't be, i.e. is that catch statement actually useful 
as-is? {{NetUtils.wrapException}} will properly maintain the class of a 
{{SocketTimeoutException}}; I dug around a little and it was not immediately 
obvious to me why it received an {{IOException}} wrapped around a timeout 
rather than just a timeout.

It does seem that probably we want to catch all {{IOException}} here, but like 
I said I'm not familiar with this area and don't know if there is a good reason 
not to. Maybe re-ping [~kihwal] - I definitely agree that with proper tuning 
this issue shouldn't happen, but it does seem that the original complaint is 
valid and that this could be more robust. 


was (Author: xkrogen):
Hey [~hexiaoqiao], I have to admit I'm not familiar with this portion of the 
codebase. But it seems that the underlying issue is still a 
{{SocketTimeoutException}}, which we are trying to catch. Is this just an issue 
of an exception being over-wrapped? If it is over-wrapped in this case, is 
there any case where it won't be, i.e. is that catch statement actually useful 
as-is?

It does seem that probably we want to catch all {{IOException}} here, but like 
I said I'm not familiar with this area and don't know if there is a good reason 
not to. Maybe re-ping [~kihwal] - I definitely agree that with proper tuning 
this issue shouldn't happen, but it does seem that the original complaint is 
valid and that this could be more robust. 

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize 

[jira] [Commented] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364361#comment-16364361
 ] 

Steve Loughran commented on HDFS-13113:
---

this is in HDFS 3.1; leaving the JIRA open in case we want to add a patch which 
applies to branch-3.0; there's conflict in the NFS module
{code}
both modified:   
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java
both modified:   
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
both modified:   
hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
{code}


> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HDFS-13113
> URL: https://issues.apache.org/jira/browse/HDFS-13113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, nfs
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.05.patch, HADOOP-10571.07.patch
>
>
> FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log 
> statements, including some in Datanode and elsewhere.
> I'm provisionally +1 on that, but want to run it on the standalone tests 
> (Yetus has already done them), and give the HDFS developers warning of a 
> change which is going to touch their codebase.
> If anyone doesn't want the logging improvements, now is your chance to say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364356#comment-16364356
 ] 

Erik Krogen commented on HDFS-12749:


Hey [~hexiaoqiao], I have to admit I'm not familiar with this portion of the 
codebase. But it seems that the underlying issue is still a 
{{SocketTimeoutException}}, which we are trying to catch. Is this just an issue 
of an exception being over-wrapped? If it is over-wrapped in this case, is 
there any case where it won't be, i.e. is that catch statement actually useful 
as-is?

It does seem that probably we want to catch all {{IOException}} here, but like 
I said I'm not familiar with this area and don't know if there is a good reason 
not to. Maybe re-ping [~kihwal] - I definitely agree that with proper tuning 
this issue shouldn't happen, but it does seem that the original complaint is 
valid and that this could be more robust. 

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
> sleepAndLogInterrupts(1000, "connecting to server");
>   } 

[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364327#comment-16364327
 ] 

genericqa commented on HDFS-13142:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 249 unchanged - 0 fixed = 251 total (was 249) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}164m 39s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}218m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.TestReplaceDatanodeOnFailure |
|   | hadoop.hdfs.TestMultiThreadedHflush |
|   | hadoop.hdfs.server.namenode.TestLargeDirectoryDelete |
|   | hadoop.hdfs.TestPread |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.TestRestartDFS |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestNNHealthCheck |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.namenode.ha.TestHAMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogsDuringFailover |
|   | hadoop.hdfs.server.namenode.TestGetContentSummaryWithPermission |
|   | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.TestFileAppend2 |
|   | 

[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364305#comment-16364305
 ] 

genericqa commented on HDFS-12749:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-12749 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-12749 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12895403/HDFS-12749.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23067/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
>   

[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-02-14 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364280#comment-16364280
 ] 

He Xiaoqiao commented on HDFS-12749:


ping [~xkrogen]
I also had seen this problems in our big production cluster, and I think this 
is a blocked issue.
The issue was original reported by my colleague [~tanyuxin] and we applied this 
patch to our production cluster based branch-2.7, the problems as mentioned 
above disappears.

I think the description of this ticket may lead to little ambiguity. I would 
like to offer more information,

a) {{BPServiceActor#register}} only catch exception {{EOFException}} and 
{{SocketTimeoutException}} (both are subclass of IOException) when register to 
NameNode.
b) when the request pass to under {{RPC}} layer, {{Client#call}} (line 1448) 
may throw many type exceptions, such as {{InterruptedIOException}}, 
{{IOException}}, etc. of course for different exception reasons.
c) Load of NameNode may very high during restarting process, especially in big 
cluster. when datanode reregister to namenode at this time, {{RPC#Client}} may 
throw IOException (not subclass of IOException) to {{BPServiceActor#register}} 
as [~tanyuxin] describe above.
d) The IOException will be catch by {{BPServiceActor#processCommand}} and print 
warn log {{`WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error 
processing datanode Command`}} but not reinit context of {{BlockPool}} when 
reregister and continue run, so {{BlockReport}} will not be scheduled 
immediately (when DataNode send {{BlockReport}} to restarting NameNode based on 
the last {{BlockReport}} time.)
e) Actually, There are other subsequent problems besides NOT schedule 
{{BlockReport}} immediately. Such as block token secret keys not update 
correctly and client can not read {{Blocks}} from this DataNode even if hold 
the correct BlockToken.

I have reviewed patch and one minor suggestion,
a) please rebase an active branch (such as branch-2.7) and offer new patch;
b) it will be better if add new UnitTest for this fix.

If I am wrong please correct me. [~tanyuxin]

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: TanYuxin
>Priority: Major
> Attachments: HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which 

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364194#comment-16364194
 ] 

genericqa commented on HDFS-13119:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 19m 
50s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 0 unchanged - 433 fixed = 0 total (was 433) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m  
9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
0m 10s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 10s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m 
10s{color} | {color:blue} ASF License check generated no output? {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13119 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910583/HDFS-13119.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 8a9d3b12e662 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 60971b8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23065/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23065/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| findbugs | 

[jira] [Updated] (HDFS-13147) Support -c argument for DFS command head and tail

2018-02-14 Thread Jianfei Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianfei Jiang updated HDFS-13147:
-
Status: Patch Available  (was: In Progress)

Patch 003: Fix the testcase failure which is related.

> Support -c argument for DFS command head and tail
> -
>
> Key: HDFS-13147
> URL: https://issues.apache.org/jira/browse/HDFS-13147
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Attachments: HDFS-13147.001.patch, HDFS-13147.002.patch, 
> HDFS-13147.003.patch
>
>
> The offset of command {{head}} and {{tail}} is hard coded as 1024 bytes. Goal 
> to improve it that the offset can be specified by user like Linux commands. 
> Then the commands will be more flexible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13147) Support -c argument for DFS command head and tail

2018-02-14 Thread Jianfei Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianfei Jiang updated HDFS-13147:
-
Attachment: HDFS-13147.003.patch

> Support -c argument for DFS command head and tail
> -
>
> Key: HDFS-13147
> URL: https://issues.apache.org/jira/browse/HDFS-13147
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Attachments: HDFS-13147.001.patch, HDFS-13147.002.patch, 
> HDFS-13147.003.patch
>
>
> The offset of command {{head}} and {{tail}} is hard coded as 1024 bytes. Goal 
> to improve it that the offset can be specified by user like Linux commands. 
> Then the commands will be more flexible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13147) Support -c argument for DFS command head and tail

2018-02-14 Thread Jianfei Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianfei Jiang updated HDFS-13147:
-
Status: In Progress  (was: Patch Available)

> Support -c argument for DFS command head and tail
> -
>
> Key: HDFS-13147
> URL: https://issues.apache.org/jira/browse/HDFS-13147
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jianfei Jiang
>Assignee: Jianfei Jiang
>Priority: Minor
> Attachments: HDFS-13147.001.patch, HDFS-13147.002.patch
>
>
> The offset of command {{head}} and {{tail}} is hard coded as 1024 bytes. Goal 
> to improve it that the offset can be specified by user like Linux commands. 
> Then the commands will be more flexible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-14 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364058#comment-16364058
 ] 

Yiqun Lin commented on HDFS-13119:
--

Thanks for the review, [~elgoiri].
{quote}Otherwise, we could just do:
{noformat}
 if (isClusterUnAvailable(nsId) && retryCount > 0) {
 throw new IOException("No namenode available under nameservice " + nsId, ioe);
 }
{noformat}
Then, the default logic takes care of the first retry.
{quote}
Actually the default logic won't takes care of the first retry. Here we use the 
retry policy {{FailoverOnNetworkExceptionRetry}}, it will firstly jump into 
logic of {{RetryDecision.FAILOVER_AND_RETRY}} and throw {{StandbyException}}. 
In the failover rerty, the retry count is passing as 0 again.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-14 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364058#comment-16364058
 ] 

Yiqun Lin edited comment on HDFS-13119 at 2/14/18 1:51 PM:
---

Thanks for the review, [~elgoiri].
{quote}Otherwise, we could just do:
{noformat}
 if (isClusterUnAvailable(nsId) && retryCount > 0) {
 throw new IOException("No namenode available under nameservice " + nsId, ioe);
 }
{noformat}
Then, the default logic takes care of the first retry.
{quote}
Actually the default logic won't takes care of the first retry. Here we use the 
retry policy {{FailoverOnNetworkExceptionRetry}}, it will firstly jump into 
logic of {{RetryDecision.FAILOVER_AND_RETRY}} and throw {{StandbyException}}. 
In the failover rerty, the retry count is passing as 0 again.

 

Attach the new patch to fix some warnings.


was (Author: linyiqun):
Thanks for the review, [~elgoiri].
{quote}Otherwise, we could just do:
{noformat}
 if (isClusterUnAvailable(nsId) && retryCount > 0) {
 throw new IOException("No namenode available under nameservice " + nsId, ioe);
 }
{noformat}
Then, the default logic takes care of the first retry.
{quote}
Actually the default logic won't takes care of the first retry. Here we use the 
retry policy {{FailoverOnNetworkExceptionRetry}}, it will firstly jump into 
logic of {{RetryDecision.FAILOVER_AND_RETRY}} and throw {{StandbyException}}. 
In the failover rerty, the retry count is passing as 0 again.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-14 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13119:
-
Attachment: HDFS-13119.003.patch

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364040#comment-16364040
 ] 

genericqa commented on HDFS-11187:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
5s{color} | {color:red} Docker failed to build yetus/hadoop:tp-17701. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11187 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910582/HDFS-11187-branch-2.004.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23064/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-14 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Patch Available  (was: Open)

Added fix for FsDatasetImpl#append as requested.

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-14 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Open  (was: Patch Available)

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-14 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Attachment: HDFS-11187-branch-2.004.patch

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-02-14 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-13142:
---
Attachment: HDFS-13142.002.patch

> Define and Implement a DiifList Interface to store and manage SnapshotDiffs
> ---
>
> Key: HDFS-13142
> URL: https://issues.apache.org/jira/browse/HDFS-13142
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch
>
>
> The InodeDiffList class contains a generic List to store snapshotDiffs. The 
> generic List interface is bulky and to store and manage snapshotDiffs, we 
> need only a few specific methods. 
> This Jira proposes to define a new interface called DiffList interface which 
> will be used to store and manage snapshotDiffs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-02-14 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363869#comment-16363869
 ] 

Shashikant Banerjee commented on HDFS-13142:


Thanks [~szetszwo], for the review. patch v2 fixes the checkStyle warnings.

> Define and Implement a DiifList Interface to store and manage SnapshotDiffs
> ---
>
> Key: HDFS-13142
> URL: https://issues.apache.org/jira/browse/HDFS-13142
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch
>
>
> The InodeDiffList class contains a generic List to store snapshotDiffs. The 
> generic List interface is bulky and to store and manage snapshotDiffs, we 
> need only a few specific methods. 
> This Jira proposes to define a new interface called DiffList interface which 
> will be used to store and manage snapshotDiffs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363850#comment-16363850
 ] 

Steve Loughran commented on HDFS-13108:
---

Where is the documentation of the URI ?

h3. OzoneFileSystem

L95 can the pattern be made static? If it is only used in initialize, it can be 
a local var
L308. Use Precondition.checkArgument; include the URL in the error message 
built up
L433. What if the path doesn't have a parent?

h3. TestOzoneFileInterfaces

L43. the imports are out of order w.r.t the Hadoop rules. Can it be fixed now, 
before any merge.
L44. If you do a static import of Assert. no need to use \{{Assert.}} in front 
of every assertion.

L98. Have init declare it throws Exception and no need to catch & rethrow URI 
syntax lossily.
L127. Only need to use \{{this.}} prefix in the ctor
L141 use IOUtils to close all of these, if for some reason you can't check for 
each one being null first.

L150. Do a cast, not an assert, so that something meaninful is thrown. I have a 
strict "Veto all all patches where AssertTrue/AssertFalse don't include an 
error message" policy, and as I've been invited to comment, you've just 
encountered it. Sorry. As ClassCastException is meaningful, avoid the problem 
by not bothering with the assert.

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check

2018-02-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363842#comment-16363842
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13136:


+1 the 001 patch looks good.  Thanks for the fixing all the methods.

The failed tests seem not related.  Please take a look.

> Avoid taking FSN lock while doing group member lookup for FSD permission check
> --
>
> Key: HDFS-13136
> URL: https://issues.apache.org/jira/browse/HDFS-13136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HDFS-13136.001.patch
>
>
> Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
> take FSN lock first and then FSD lock.  The permission check is done via 
> FSPermissionChecker at FSD layer assuming FSN lock is taken. 
> The FSPermissionChecker constructor invokes callerUgi.getGroups() that can 
> take seconds sometimes. There are external cache scheme such SSSD and 
> internal cache scheme for group lookup. However, the delay could still occur 
> during cache refresh, which causes severe FSN lock contentions and 
> unresponsive namenode issues.
> Checking the current code, we found that getBlockLocations(..) did it right 
> but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. 
> This ticket is open to ensure the group lookup for permission checker is 
> outside the FSN lock.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs

2018-02-14 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363825#comment-16363825
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13142:


Thank [~shashikant], the patch looks good.

There are a few checkstyle warnings.  Could you fix them?

> Define and Implement a DiifList Interface to store and manage SnapshotDiffs
> ---
>
> Key: HDFS-13142
> URL: https://issues.apache.org/jira/browse/HDFS-13142
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13142.001.patch
>
>
> The InodeDiffList class contains a generic List to store snapshotDiffs. The 
> generic List interface is bulky and to store and manage snapshotDiffs, we 
> need only a few specific methods. 
> This Jira proposes to define a new interface called DiffList interface which 
> will be used to store and manage snapshotDiffs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-14 Thread Istvan Fajth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363771#comment-16363771
 ] 

Istvan Fajth commented on HDFS-13040:
-

Hi [~xiaochen],

I have checked into the proposal, a few questions/suggestions on the test code 
if I may:
 - in the test code's initKerberizedCluster method you set the log level for 
KerberosAuthenticator, and UGI to DEBUG, I am not sure if it is necessary, and 
if you decide to remove that you don't need to make the LOG public in those 
classes.
 - also in the initKerberizedCluster method, the basedir is set based on the 
TestDFSInotifyEventInputStream class, which might remain there due to my 
mistake, when I put the test into a separate class to upload just this test to 
this JIRA.
 - After the Thread.sleep in the test code's testWithKerberizedCluster method 
there is a log.error that I believe as well remained there from my code, and I 
believe it should not be error level
 - Class doc says:
 Class for Kerberized test cases for \{@link TestDFSInotifyEventInputStream}
should this link to just DFSInotifyEventInputStream instead of the test class?

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.03.patch, HDFS-13040.half.test.patch, 
> TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> Typically if NameNode has an expired Kerberos ticket, the error handling for 
> the typical edit log tailing would let NameNode to relogin with its own 
> Kerberos principal. 

[jira] [Commented] (HDFS-13147) Support -c argument for DFS command head and tail

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363754#comment-16363754
 ] 

genericqa commented on HDFS-13147:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 8s{color} | {color:green} root: The patch generated 0 new + 175 unchanged - 2 
fixed = 175 total (was 177) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 52s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 51s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}222m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.http.TestHttpServerWithSpengo |
|   | hadoop.cli.TestCLI |
|   | hadoop.log.TestLogLevel |
|   | hadoop.security.token.delegation.web.TestWebDelegationToken |
|   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13147 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910512/HDFS-13147.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e107ea7dde9d 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build 

[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-13113:
--
Status: Patch Available  (was: Open)

> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HDFS-13113
> URL: https://issues.apache.org/jira/browse/HDFS-13113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, nfs
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.05.patch, HADOOP-10571.07.patch
>
>
> FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log 
> statements, including some in Datanode and elsewhere.
> I'm provisionally +1 on that, but want to run it on the standalone tests 
> (Yetus has already done them), and give the HDFS developers warning of a 
> change which is going to touch their codebase.
> If anyone doesn't want the logging improvements, now is your chance to say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-13113:
--
Attachment: HADOOP-10571.07.patch

> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HDFS-13113
> URL: https://issues.apache.org/jira/browse/HDFS-13113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, nfs
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.05.patch, HADOOP-10571.07.patch
>
>
> FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log 
> statements, including some in Datanode and elsewhere.
> I'm provisionally +1 on that, but want to run it on the standalone tests 
> (Yetus has already done them), and give the HDFS developers warning of a 
> change which is going to touch their codebase.
> If anyone doesn't want the logging improvements, now is your chance to say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-13113:
--
Status: Open  (was: Patch Available)

> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HDFS-13113
> URL: https://issues.apache.org/jira/browse/HDFS-13113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, nfs
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571.05.patch
>
>
> FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log 
> statements, including some in Datanode and elsewhere.
> I'm provisionally +1 on that, but want to run it on the standalone tests 
> (Yetus has already done them), and give the HDFS developers warning of a 
> change which is going to touch their codebase.
> If anyone doesn't want the logging improvements, now is your chance to say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier

2018-02-14 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363735#comment-16363735
 ] 

Rakesh R commented on HDFS-13110:
-

Addressed all the above Uma's comments.
 [~daryn], [~umamaheswararao], [~surendrasingh]. Appreciate reviews, thanks!

> [SPS]: Reduce the number of APIs in NamenodeProtocol used by external 
> satisfier
> ---
>
> Key: HDFS-13110
> URL: https://issues.apache.org/jira/browse/HDFS-13110
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13110-HDFS-10285-00.patch, 
> HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, 
> HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch
>
>
> This task is to address the following [~daryn]'s comments. Please refer 
> HDFS-10285 to see more detailed discussion.
> *Comment-10)*
> {quote}
> NamenodeProtocolTranslatorPB
> Most of the api changes appear unnecessary.
> IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption 
> that any and all IOEs means FNF which probably isn’t the intention during rpc 
> exceptions.
> {quote}
>  *Comment-13)*
> {quote}
> StoragePolicySatisfier
>  It appears to make back-to-back calls to hasLowRedundancyBlocks and 
> getFileInfo for every file. Haven’t fully groked the code, but if low 
> redundancy is not the common case, then it shouldn’t be called unless/until 
> needed. It looks like files that are under replicated are re-queued again?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363687#comment-16363687
 ] 

genericqa commented on HDFS-13040:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 12m  6s{color} 
| {color:red} root generated 2 new + 1234 unchanged - 0 fixed = 1236 total (was 
1234) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 1s{color} | {color:green} root: The patch generated 0 new + 150 unchanged - 2 
fixed = 150 total (was 152) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 35s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
4s{color} | {color:green} hadoop-auth in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  0s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}135m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}234m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.log.TestLogLevel |
|   | hadoop.http.TestHttpServerWithSpengo |
|   | hadoop.security.token.delegation.web.TestWebDelegationToken |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13040 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-13052) WebHDFS: Add support for snasphot diff

2018-02-14 Thread Lokesh Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363648#comment-16363648
 ] 

Lokesh Jain commented on HDFS-13052:


[~xyao] the checkstyle issues appear in NamenodeWebHdfsMethods and I think for 
readability purpose we should not remove them. The patch follows the syntax 
structure defined in NamenodeWebHdfsMethods class. I have resumitted the v7 
patch for triggering jenkins.

> WebHDFS: Add support for snasphot diff
> --
>
> Key: HDFS-13052
> URL: https://issues.apache.org/jira/browse/HDFS-13052
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, 
> HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, 
> HDFS-13052.006.patch, HDFS-13052.007.patch
>
>
> This Jira aims to implement snapshot diff operation for webHdfs filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >