[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13008: -- Attachment: HDFS-13008-HDFS-7240.003.patch > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008-HDFS-7240.001.patch, > HDFS-13008-HDFS-7240.002.patch, HDFS-13008-HDFS-7240.003.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365207#comment-16365207 ] genericqa commented on HDFS-13008: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 31s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 53s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 7s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 41s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 41s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 41s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 0 unchanged - 1 fixed = 2 total (was 1) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 12s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 41s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b | | JIRA Issue | HDFS-13008 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910685/HDFS-13008-HDFS-7240.002.patch | | Optional Tests | asflicense compile cc mvnsite javac unit javadoc mvninstall shadedclient findbugs checkstyle | | uname | Linux c65b45fd6d40 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | |
[jira] [Updated] (HDFS-13130) Log object instance get incorrectly in SlowDiskTracker
[ https://issues.apache.org/jira/browse/HDFS-13130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13130: - Target Version/s: 3.2.0 (was: 3.1.0) Fix Version/s: (was: 3.1.0) 3.2.0 Reset the fix version. Currently trunk is releasing for 3.2.0 > Log object instance get incorrectly in SlowDiskTracker > -- > > Key: HDFS-13130 > URL: https://issues.apache.org/jira/browse/HDFS-13130 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jianfei Jiang >Assignee: Jianfei Jiang >Priority: Minor > Fix For: 3.2.0 > > Attachments: HDFS-13130.patch > > > In class org.apache.hadoop.hdfs.server.blockmanagement.*SlowDiskTracker*, the > LOG is targeted to *SlowPeerTracker*.class incorrectly. > {code:java} > public class SlowDiskTracker { > public static final Logger LOG = > LoggerFactory.getLogger(SlowPeerTracker.class);{code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs
[ https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365195#comment-16365195 ] Tsz Wo Nicholas Sze edited comment on HDFS-13142 at 2/15/18 7:27 AM: - There are still 2 checkstyle warnings and many unit test failures. Please take a look, [~shashikant]. Thanks. was (Author: szetszwo): There are still 2 checkstyle warnings and many unit tests failure. Please take a look, [~shashikant]. Thanks. > Define and Implement a DiifList Interface to store and manage SnapshotDiffs > --- > > Key: HDFS-13142 > URL: https://issues.apache.org/jira/browse/HDFS-13142 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch > > > The InodeDiffList class contains a generic List to store snapshotDiffs. The > generic List interface is bulky and to store and manage snapshotDiffs, we > need only a few specific methods. > This Jira proposes to define a new interface called DiffList interface which > will be used to store and manage snapshotDiffs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs
[ https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365195#comment-16365195 ] Tsz Wo Nicholas Sze commented on HDFS-13142: There are still 2 checkstyle warnings and many unit tests failure. Please take a look, [~shashikant]. Thanks. > Define and Implement a DiifList Interface to store and manage SnapshotDiffs > --- > > Key: HDFS-13142 > URL: https://issues.apache.org/jira/browse/HDFS-13142 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch > > > The InodeDiffList class contains a generic List to store snapshotDiffs. The > generic List interface is bulky and to store and manage snapshotDiffs, we > need only a few specific methods. > This Jira proposes to define a new interface called DiffList interface which > will be used to store and manage snapshotDiffs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check
[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365193#comment-16365193 ] Tsz Wo Nicholas Sze commented on HDFS-13136: Thanks for the update! +1 on the 002 patch. > Avoid taking FSN lock while doing group member lookup for FSD permission check > -- > > Key: HDFS-13136 > URL: https://issues.apache.org/jira/browse/HDFS-13136 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch > > > Namenode has FSN lock and FSD lock. Most of the namenode operations need to > take FSN lock first and then FSD lock. The permission check is done via > FSPermissionChecker at FSD layer assuming FSN lock is taken. > The FSPermissionChecker constructor invokes callerUgi.getGroups() that can > take seconds sometimes. There are external cache scheme such SSSD and > internal cache scheme for group lookup. However, the delay could still occur > during cache refresh, which causes severe FSN lock contentions and > unresponsive namenode issues. > Checking the current code, we found that getBlockLocations(..) did it right > but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. > This ticket is open to ensure the group lookup for permission checker is > outside the FSN lock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN
[ https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365188#comment-16365188 ] Vinayakumar B commented on HDFS-8693: - bq. Unlikely to be an issue though unless it's constantly refreshed for new NNs. yes you are right. This might not be issue for current usecase of refreshNNs. Max Number of offer threads alive (created using refreshNN()) is same as number of NNs datanode can report to, which will be limited). > refreshNamenodes does not support adding a new standby to a running DN > -- > > Key: HDFS-8693 > URL: https://issues.apache.org/jira/browse/HDFS-8693 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 2.6.0 >Reporter: Jian Fang >Assignee: Ajith S >Priority: Critical > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: HDFS-8693-03-Addendum-branch-2.patch, > HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, > HDFS-8693.1.patch > > > I tried to run the following command on a Hadoop 2.6.0 cluster with HA > support > $ hdfs dfsadmin -refreshNamenodes datanode-host:port > to refresh name nodes on data nodes after I replaced one name node with a new > one so that I don't need to restart the data nodes. However, I got the > following error: > refreshNamenodes: HA does not currently support adding a new standby to a > running DN. Please do a rolling restart of DNs to reconfigure the list of NNs. > I checked the 2.6.0 code and the error was thrown by the following code > snippet, which led me to this JIRA. > void refreshNNList(ArrayList addrs) throws IOException { > Set oldAddrs = Sets.newHashSet(); > for (BPServiceActor actor : bpServices) > { oldAddrs.add(actor.getNNSocketAddress()); } > Set newAddrs = Sets.newHashSet(addrs); > if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) > { // Keep things simple for now -- we can implement this at a later date. > throw new IOException( "HA does not currently support adding a new standby to > a running DN. " + "Please do a rolling restart of DNs to reconfigure the list > of NNs."); } > } > Looks like this the refreshNameNodes command is an uncompleted feature. > Unfortunately, the new name node on a replacement is critical for auto > provisioning a hadoop cluster with HDFS HA support. Without this support, the > HA feature could not really be used. I also observed that the new standby > name node on the replacement instance could stuck in safe mode because no > data nodes check in with it. Even with a rolling restart, it may take quite > some time to restart all data nodes if we have a big cluster, for example, > with 4000 data nodes, let alone restarting DN is way too intrusive and it is > not a preferable operation in production. It also increases the chance for a > double failure because the standby name node is not really ready for a > failover in the case that the current active name node fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13008: -- Attachment: HDFS-13008-HDFS-7240.002.patch > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008-HDFS-7240.001.patch, > HDFS-13008-HDFS-7240.002.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9608) Disk IO imbalance in HDFS with heterogeneous storages
[ https://issues.apache.org/jira/browse/HDFS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365171#comment-16365171 ] Brahma Reddy Battula commented on HDFS-9608: Yes, It's nice to have. > Disk IO imbalance in HDFS with heterogeneous storages > - > > Key: HDFS-9608 > URL: https://issues.apache.org/jira/browse/HDFS-9608 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Wei Zhou >Assignee: Wei Zhou >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-9608.01.patch, HDFS-9608.02.patch, > HDFS-9608.03.patch, HDFS-9608.04.patch, HDFS-9608.05.patch, > HDFS-9608.06.patch, HDFS-9608.07.patch > > > Currently RoundRobinVolumeChoosingPolicy use a shared index to choose volumes > in HDFS with heterogeneous storages, this leads to non-RR choosing mode for > certain type of storage. > Besides, it uses a shared lock for synchronization which limits the > concurrency of volume choosing process. Volume choosing threads that > operating on different storage types should be able to run concurrently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365168#comment-16365168 ] Surendra Singh Lilhore commented on HDFS-8277: -- Hi, [~arpitagarwal], [~vinayrpet] and [~brahmareddy] We got the same issue in production environment. Can we finalized the solution for this Jira ?. > Safemode enter fails when Standby NameNode is down > -- > > Key: HDFS-8277 > URL: https://issues.apache.org/jira/browse/HDFS-8277 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.6.0 > Environment: HDP 2.2.0 >Reporter: Hari Sekhon >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, > HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch > > > HDFS fails to enter safemode when the Standby NameNode is down (eg. due to > AMBARI-10536). > {code}hdfs dfsadmin -safemode enter > safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused{code} > This appears to be a bug in that it's not trying both NameNodes like the > standard hdfs client code does, and is instead stopping after getting a > connection refused from nn1 which is down. I verified normal hadoop fs writes > and reads via cli did work at this time, using nn2. I happened to run this > command as the hdfs user on nn2 which was the surviving Active NameNode. > After I re-bootstrapped the Standby NN to fix it the command worked as > expected again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier
[ https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365166#comment-16365166 ] Rakesh R commented on HDFS-13110: - Attached new patch, where I renamed the following classes as now it uses {{Type Parameters}} represents Id and path string. - {{FileIdCollector.java => FileCollector.java}} and - {{ExternalSPSFileIDCollector.java => ExternalSPSFilePathCollector.java}} > [SPS]: Reduce the number of APIs in NamenodeProtocol used by external > satisfier > --- > > Key: HDFS-13110 > URL: https://issues.apache.org/jira/browse/HDFS-13110 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13110-HDFS-10285-00.patch, > HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, > HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch, > HDFS-13110-HDFS-10285-05.patch > > > This task is to address the following [~daryn]'s comments. Please refer > HDFS-10285 to see more detailed discussion. > *Comment-10)* > {quote} > NamenodeProtocolTranslatorPB > Most of the api changes appear unnecessary. > IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption > that any and all IOEs means FNF which probably isn’t the intention during rpc > exceptions. > {quote} > *Comment-13)* > {quote} > StoragePolicySatisfier > It appears to make back-to-back calls to hasLowRedundancyBlocks and > getFileInfo for every file. Haven’t fully groked the code, but if low > redundancy is not the common case, then it shouldn’t be called unless/until > needed. It looks like files that are under replicated are re-queued again? > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier
[ https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13110: Attachment: HDFS-13110-HDFS-10285-05.patch > [SPS]: Reduce the number of APIs in NamenodeProtocol used by external > satisfier > --- > > Key: HDFS-13110 > URL: https://issues.apache.org/jira/browse/HDFS-13110 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13110-HDFS-10285-00.patch, > HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, > HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch, > HDFS-13110-HDFS-10285-05.patch > > > This task is to address the following [~daryn]'s comments. Please refer > HDFS-10285 to see more detailed discussion. > *Comment-10)* > {quote} > NamenodeProtocolTranslatorPB > Most of the api changes appear unnecessary. > IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption > that any and all IOEs means FNF which probably isn’t the intention during rpc > exceptions. > {quote} > *Comment-13)* > {quote} > StoragePolicySatisfier > It appears to make back-to-back calls to hasLowRedundancyBlocks and > getFileInfo for every file. Haven’t fully groked the code, but if low > redundancy is not the common case, then it shouldn’t be called unless/until > needed. It looks like files that are under replicated are re-queued again? > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365155#comment-16365155 ] genericqa commented on HDFS-13008: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 11s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 54s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-hdfs-project: The patch generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 15s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 14s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | org.apache.hadoop.ozone.container.common.helpers.ContainerData.getContainerID() is unsynchronized, org.apache.hadoop.ozone.container.common.helpers.ContainerData.setContainerID(Long) is synchronized At ContainerData.java:synchronized At ContainerData.java:[line 247] | | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.ozone.web.client.TestKeysRatis | | |
[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365131#comment-16365131 ] genericqa commented on HDFS-12051: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 1234 unchanged - 19 fixed = 1238 total (was 1253) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}135m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}184m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Increment of volatile field org.apache.hadoop.hdfs.server.namenode.NameCache.size in org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[]) At NameCache.java:in org.apache.hadoop.hdfs.server.namenode.NameCache.put(byte[]) At NameCache.java:[line 125] | | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12051 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910663/HDFS-12051.12.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux a62106be8c4b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365105#comment-16365105 ] He Xiaoqiao commented on HDFS-12749: [~kihwal],[~xkrogen] Thanks for your comments. {quote}the problem is that the NN correctly processed the registration, but the DN timed out before receiving the response. Since from NN point of view the registration was complete, it did not send another DNA_REGISTER command.{quote} thanks for your additional comments [~xkrogen]. Since NN considers DN register successfully but DN timeout before receiving response, and DN would not get DNA_REGISTER command from NN at next heartbeat intervel. If as except, DN should catch exception in {{BPServiceActor#register}} and retry until register successfully as catching {{SocketTimeoutException}} or {{EOFException}} currently. One question is why underlying RPC not throw {{SocketTimeoutException}} but {{IOException}}, After trace invoke stack, I find {{NetUtils#wrapException}} who invoke by {{Client#call}} line 1474 may be the main reason. {code:java} if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } {code} but I am confused why {{call.error}} set IOException and throw upper lead to {{BPServiceActor#register}} not catch it. After reviewed HDFS-8995 I think it could not fix this issue. Anyway, I agree to fix ti with catching IOException in {{BPServiceActor#register}} and retry until register successfully. [~kihwal],[~xkrogen] do you mind having a look? > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >*
[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365087#comment-16365087 ] Wangda Tan commented on HDFS-12452: --- [~xyao], if this patch can be done by this week, please commit to branch-3.1 as well. Thanks. > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13008: -- Attachment: HDFS-13008-HDFS-7240.001.patch > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008-HDFS-7240.001.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13008: -- Attachment: (was: HDFS-13008.001.patch) > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008-HDFS-7240.001.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365050#comment-16365050 ] genericqa commented on HDFS-13008: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-13008 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13008 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910670/HDFS-13008.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23075/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008.001.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13008: -- Status: Patch Available (was: Open) > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008.001.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13008) Ozone: Add DN container open/close state to container report
[ https://issues.apache.org/jira/browse/HDFS-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13008: -- Attachment: HDFS-13008.001.patch > Ozone: Add DN container open/close state to container report > > > Key: HDFS-13008 > URL: https://issues.apache.org/jira/browse/HDFS-13008 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13008.001.patch > > > HDFS-12799 added support to allow SCM send closeContainerCommand to DNs. This > ticket is opened to add the DN container close state to container report so > that SCM container state manager can update state from closing to closed when > DN side container is fully closed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption
[ https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365041#comment-16365041 ] genericqa commented on HDFS-13081: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 12s{color} | {color:orange} root: The patch generated 5 new + 163 unchanged - 2 fixed = 168 total (was 165) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 21s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}196m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13081 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910640/HDFS-13081.001.patch | | Optional Tests | asflicense mvnsite compile javac javadoc mvninstall unit shadedclient findbugs checkstyle | | uname | Linux e725a7925929 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8f66aff | | maven | version: Apache
[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365020#comment-16365020 ] Misha Dmitriev commented on HDFS-12051: --- [~atm] I've just submitted a patch where I've addressed your comments. I've added functionality to completely disable NameCache by specifying DFS_NAMENODE_NAME_CACHE_SIZE_RATIO_KEY = 0.0. I've added a test for this, plus a stress-test where the cache is exercised by multiple threads and the number of unique names exceeds the cache's capacity (this may happen in production). As we discussed, so far I cannot find a good way to pass around a "non-singleton" NameCache instance around all the code that needs it. On the other hand, I explained that I don't see problems if this singleton is used by multiple NameNode instances running within the same JVM. > Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly > those denoting file/directory names) to save memory > - > > Key: HDFS-12051 > URL: https://issues.apache.org/jira/browse/HDFS-12051 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, > HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, > HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, > HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, > HDFS-12051.11.patch, HDFS-12051.12.patch > > > When snapshot diff operation is performed in a NameNode that manages several > million HDFS files/directories, NN needs a lot of memory. Analyzing one heap > dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays > result in 6.5% memory overhead, and most of these arrays are referenced by > {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}} > and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}: > {code:java} > 19. DUPLICATE PRIMITIVE ARRAYS > Types of duplicate objects: > Ovhd Num objs Num unique objs Class name > 3,220,272K (6.5%) 104749528 25760871 byte[] > > 1,841,485K (3.7%), 53194037 dup arrays (13158094 unique) > 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 > of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, > 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), > 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 > of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, > 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...) > ... and 45902395 more arrays, of which 13158084 are unique > <-- > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name > <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode > <-- {j.u.ArrayList} <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs > <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- > org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 > elements) ... <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java > Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER > 409,830K (0.8%), 13482787 dup arrays (13260241 unique) > 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...) > ... and
[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HDFS-12051: -- Release Note: Addressed the @atm's comments Status: Patch Available (was: In Progress) > Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly > those denoting file/directory names) to save memory > - > > Key: HDFS-12051 > URL: https://issues.apache.org/jira/browse/HDFS-12051 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, > HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, > HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, > HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, > HDFS-12051.11.patch, HDFS-12051.12.patch > > > When snapshot diff operation is performed in a NameNode that manages several > million HDFS files/directories, NN needs a lot of memory. Analyzing one heap > dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays > result in 6.5% memory overhead, and most of these arrays are referenced by > {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}} > and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}: > {code:java} > 19. DUPLICATE PRIMITIVE ARRAYS > Types of duplicate objects: > Ovhd Num objs Num unique objs Class name > 3,220,272K (6.5%) 104749528 25760871 byte[] > > 1,841,485K (3.7%), 53194037 dup arrays (13158094 unique) > 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 > of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, > 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), > 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 > of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, > 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...) > ... and 45902395 more arrays, of which 13158084 are unique > <-- > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name > <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode > <-- {j.u.ArrayList} <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs > <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- > org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 > elements) ... <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java > Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER > 409,830K (0.8%), 13482787 dup arrays (13260241 unique) > 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...) > ... and 13479257 more arrays, of which 13260231 are unique > <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- > org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- >
[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HDFS-12051: -- Attachment: HDFS-12051.12.patch > Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly > those denoting file/directory names) to save memory > - > > Key: HDFS-12051 > URL: https://issues.apache.org/jira/browse/HDFS-12051 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, > HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, > HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, > HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, > HDFS-12051.11.patch, HDFS-12051.12.patch > > > When snapshot diff operation is performed in a NameNode that manages several > million HDFS files/directories, NN needs a lot of memory. Analyzing one heap > dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays > result in 6.5% memory overhead, and most of these arrays are referenced by > {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}} > and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}: > {code:java} > 19. DUPLICATE PRIMITIVE ARRAYS > Types of duplicate objects: > Ovhd Num objs Num unique objs Class name > 3,220,272K (6.5%) 104749528 25760871 byte[] > > 1,841,485K (3.7%), 53194037 dup arrays (13158094 unique) > 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 > of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, > 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), > 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 > of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, > 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...) > ... and 45902395 more arrays, of which 13158084 are unique > <-- > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name > <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode > <-- {j.u.ArrayList} <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs > <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- > org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 > elements) ... <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java > Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER > 409,830K (0.8%), 13482787 dup arrays (13260241 unique) > 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...) > ... and 13479257 more arrays, of which 13260231 are unique > <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- > org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- >
[jira] [Updated] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory
[ https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HDFS-12051: -- Status: In Progress (was: Patch Available) > Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly > those denoting file/directory names) to save memory > - > > Key: HDFS-12051 > URL: https://issues.apache.org/jira/browse/HDFS-12051 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, > HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, > HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, > HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, > HDFS-12051.11.patch, HDFS-12051.12.patch > > > When snapshot diff operation is performed in a NameNode that manages several > million HDFS files/directories, NN needs a lot of memory. Analyzing one heap > dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays > result in 6.5% memory overhead, and most of these arrays are referenced by > {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}} > and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}: > {code:java} > 19. DUPLICATE PRIMITIVE ARRAYS > Types of duplicate objects: > Ovhd Num objs Num unique objs Class name > 3,220,272K (6.5%) 104749528 25760871 byte[] > > 1,841,485K (3.7%), 53194037 dup arrays (13158094 unique) > 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 > of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, > 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), > 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 > of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, > 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, > 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...) > ... and 45902395 more arrays, of which 13158084 are unique > <-- > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name > <-- org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff.snapshotINode > <-- {j.u.ArrayList} <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList.diffs <-- > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature.diffs > <-- org.apache.hadoop.hdfs.server.namenode.INode$Feature[] <-- > org.apache.hadoop.hdfs.server.namenode.INodeFile.features <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- ... (1 > elements) ... <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- j.l.ThreadGroup.threads <-- j.l.Thread.group <-- Java > Static: org.apache.hadoop.fs.FileSystem$Statistics.STATS_DATA_CLEANER > 409,830K (0.8%), 13482787 dup arrays (13260241 unique) > 430 of byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 353 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 352 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 350 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 342 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 341 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 340 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 337 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...), 334 of > byte[32](116, 97, 115, 107, 95, 49, 52, 57, 55, 48, ...) > ... and 13479257 more arrays, of which 13260231 are unique > <-- org.apache.hadoop.hdfs.server.namenode.INodeFile.name <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.bc <-- > org.apache.hadoop.util.LightWeightGSet$LinkedElement[] <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.blocks <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.blocksMap <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.this$0 > <-- j.l.Thread[] <-- > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap$1.entries <--
[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart
[ https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365007#comment-16365007 ] Anu Engineer commented on HDFS-13134: - [~ljain] meanwhile you might want to look at the checkstyl, find bugs and unit test failures. Thanks > Ozone: Format open containers on datanode restart > - > > Key: HDFS-13134 > URL: https://issues.apache.org/jira/browse/HDFS-13134 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-13134-HDFS-7240.001.patch > > > Once a datanode is restarted its open containers should be formatted. Only > the open containers whose pipeline has a replication factor of three will > need to be formatted. The format command is sent by SCM to the datanode after > the corresponding containers have been successfully replicated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13134) Ozone: Format open containers on datanode restart
[ https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364777#comment-16364777 ] Anu Engineer edited comment on HDFS-13134 at 2/15/18 12:34 AM: --- [~ljain] The change looks good, I am going to take a week to think through this and review with the rest of the team before I commit this. The reason is that command provides the ability to destroy data ( and rightfully so) [~msingh], [~nandakumar131],[~elek] and [~xyao] Please take a look at the patch when you have a chance. I want to commit this only after I get perspectives from others too. Thanks for your time and consideration. was (Author: anu): [~ljain] The change looks good, I am going to take a week to think through this and review with the rest of the team before I commit this. The reason is that command provides the ability to destroy data ( and rightfully so) [~msingh], [~nandakumar131],[~elek] and [~xyao] Please take a look at the patch when you have a chance. I want to commit this only I get perspectives from others too. Thanks for your time and consideration. > Ozone: Format open containers on datanode restart > - > > Key: HDFS-13134 > URL: https://issues.apache.org/jira/browse/HDFS-13134 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-13134-HDFS-7240.001.patch > > > Once a datanode is restarted its open containers should be formatted. Only > the open containers whose pipeline has a replication factor of three will > need to be formatted. The format command is sent by SCM to the datanode after > the corresponding containers have been successfully replicated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart
[ https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365005#comment-16365005 ] genericqa commented on HDFS-13134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 41s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 34s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 10 new + 2 unchanged - 0 fixed = 12 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 28s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}147m 8s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}203m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Possible null pointer dereference in org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(Configuration, List, DatanodeID) due to return value of called method Dereferenced at ContainerManagerImpl.java:org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(Configuration, List, DatanodeID) due to return value of called method Dereferenced at ContainerManagerImpl.java:[line 183] | | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.ozone.container.common.TestEndPoint | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.ozone.web.client.TestKeys | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.cblock.TestBufferManager | | | hadoop.ozone.web.client.TestKeysRatis | | | hadoop.cblock.TestCBlockReadWrite | | | hadoop.ozone.container.common.TestDatanodeStateMachine | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce
[jira] [Commented] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM
[ https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364943#comment-16364943 ] genericqa commented on HDFS-11699: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 38s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 15s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}160m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b | | JIRA Issue | HDFS-11699 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910624/HDFS-11699-HDFS-7240.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c3e0c85336a6 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-7240 / f3d07ef | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23071/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23071/testReport/ | | Max. process+thread count | 3982 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23071/console | | Powered by | Apache Yetus
[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364922#comment-16364922 ] genericqa commented on HDFS-12452: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 18 unchanged - 0 fixed = 19 total (was 18) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}142m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}201m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12452 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910620/HDFS-12452.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux eeb750a6171a 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f20f43 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/23070/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23070/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23070/testReport/ | | Max.
[jira] [Commented] (HDFS-13149) Ozone: Rename Corona to Freon
[ https://issues.apache.org/jira/browse/HDFS-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364911#comment-16364911 ] Anu Engineer commented on HDFS-13149: - [~msingh], [~elek] This patch may break internal stress and perf runs of ozone. Hence flagging it for your consideration, Please review when you get a chance. [~nandakumar131] Please take a look at this patch when you get a chance. > Ozone: Rename Corona to Freon > - > > Key: HDFS-13149 > URL: https://issues.apache.org/jira/browse/HDFS-13149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > Attachments: HDFS-13149-HDFS-7240.001.patch > > > While reviewing Ozone [~jghoman] and in the a comment in HDFS-12992, > [~chris.douglas] > both pointed out the Corona is a name used by a YARN project from Facebook. > This Jira proposes to rename Corona(a chemical process that produces Ozone) > to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for > coming up with both names. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13149) Ozone: Rename Corona to Freon
[ https://issues.apache.org/jira/browse/HDFS-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-13149: Attachment: HDFS-13149-HDFS-7240.001.patch > Ozone: Rename Corona to Freon > - > > Key: HDFS-13149 > URL: https://issues.apache.org/jira/browse/HDFS-13149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > Attachments: HDFS-13149-HDFS-7240.001.patch > > > While reviewing Ozone [~jghoman] and in the a comment in HDFS-12992, > [~chris.douglas] > both pointed out the Corona is a name used by a YARN project from Facebook. > This Jira proposes to rename Corona(a chemical process that produces Ozone) > to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for > coming up with both names. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly
[ https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364893#comment-16364893 ] Daryn Sharp commented on HDFS-13040: So many points to cover. You're on the right path. *SPNEGO* "JDK performed authentication on our behalf". The JDK will always transparently do spnego by internally handling the 401 and reissuing the request with a TGS. The client never sees a 401. It may see 403 if authentication failed. If a client does see 401, spnego wasn't possible because of no tgt, no tgs available, etc. The KerberosAuthenticator is going to try spnego again, and virtually guaranteed to fail for the same reason. Pretty much what you are seeing. *Sample Code: TransactionReader* I cringe when I see "secure" code like this because it makes people think security is too hard. All of the ugi conf/relogin/doas is unnecessary. Seriously, just rip it out. Pretend like you are writing what you think is insecure code. Enable security in core-site, kinit before you run it... That's it. Or if you prefer, you can leave in just this one line: UserGroupInformation.loginUserFromKeytab(princ, keytab); *Issues* {quote}At NameNode startup, the NameNode acquires a tgt, and it is saved in its ticket cache {quote} I suspected this might be the case. Please don't start your NN from a ticket cache, let alone share that ticket cache with user tools. Set the confs for keytab and principal. Forget all the expired ticket cache stuff. It's a distraction and self-inflicted pain. Let's remove some confusion and describe your NN as running as "hdfs/nn1@REALM", and you are using "superuser@REALM" to run inotify. I know you are using hdfs but it'll be clear soon. As a client, you have credentials for "superuser@REALM". After authentication to the NN's RPC server it creates a "superuser" ugi, but it has no credentials, just an identity. The inotify method calls an edit log method that makes an external rest call which needs credentials. But "superuser" has no creds, boom, big ugly gssapi stack trace. With the proposed patch, an explict doAs the login user flips you back to "hdfs/nn1@REALM" which does have credentials. It's what you want, but not correctly implemented. The root issue is a client must have an immutable identity determined when created. The URLLog ctor should save off the current user and use that for the doAs. Since the login user creates the edit log, it means any use of it, regardless of whether from rpc, will retain that identity. > Kerberized inotify client fails despite kinit properly > -- > > Key: HDFS-13040 > URL: https://issues.apache.org/jira/browse/HDFS-13040 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, > HDFS-13040.03.patch, HDFS-13040.half.test.patch, > TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java > > > This issue is similar to HDFS-10799. > HDFS-10799 turned out to be a client side issue where client is responsible > for renewing kerberos ticket actively. > However we found in a slightly setup even if client has valid Kerberos > credentials, inotify still fails. > Suppose client uses principal h...@example.com, > namenode 1 uses server principal hdfs/nn1.example@example.com > namenode 2 uses server principal hdfs/nn2.example@example.com > *After Namenodes starts for longer than kerberos ticket lifetime*, the client > fails with the following error: > {noformat} > 18/01/19 11:23:02 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) > cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We > encountered an error reading > https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3, > > https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 8683, but we thought we could read up to transaction > 8684. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701) > at
[jira] [Updated] (HDFS-13149) Ozone: Rename Corona to Freon
[ https://issues.apache.org/jira/browse/HDFS-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-13149: Description: While reviewing Ozone [~jghoman] and in the a comment in HDFS-12992, [~chris.douglas] both pointed out the Corona is a name used by a YARN project from Facebook. This Jira proposes to rename Corona(a chemical process that produces Ozone) to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for coming up with both names. was: While reviewing Ozone [~jghoman] and in the a comment in HDFS-12992 [~chris.douglas] both pointed out the Corona is a name used by a YARN project from Facebook. This Jira proposes to rename Corona(a chemical process that produces Ozone) to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for coming up with both names. > Ozone: Rename Corona to Freon > - > > Key: HDFS-13149 > URL: https://issues.apache.org/jira/browse/HDFS-13149 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Trivial > > While reviewing Ozone [~jghoman] and in the a comment in HDFS-12992, > [~chris.douglas] > both pointed out the Corona is a name used by a YARN project from Facebook. > This Jira proposes to rename Corona(a chemical process that produces Ozone) > to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for > coming up with both names. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13149) Ozone: Rename Corona to Freon
Anu Engineer created HDFS-13149: --- Summary: Ozone: Rename Corona to Freon Key: HDFS-13149 URL: https://issues.apache.org/jira/browse/HDFS-13149 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Reporter: Anu Engineer Assignee: Anu Engineer While reviewing Ozone [~jghoman] and in the a comment in HDFS-12992 [~chris.douglas] both pointed out the Corona is a name used by a YARN project from Facebook. This Jira proposes to rename Corona(a chemical process that produces Ozone) to Freon (CFCs) something that stresses Ozone. Thank to [~arpitagarwal] for coming up with both names. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption
[ https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364862#comment-16364862 ] Ajay Kumar commented on HDFS-13081: --- [~daryn] thanks for the valuable input. Updated the patch to allow DN to start in case SASL is enabled and HTTP port is privileged. cc: [~jnp],[~xyao] > Datanode#checkSecureConfig should check HTTPS and SASL encryption > - > > Key: HDFS-13081 > URL: https://issues.apache.org/jira/browse/HDFS-13081 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 3.0.0 >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch > > > Datanode#checkSecureConfig currently check the following to determine if > secure datanode is enabled. > # The server has bound to privileged ports for RPC and HTTP via > SecureDataNodeStarter. > # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain > HTTP) for the HTTP server. The SASL handshake guarantees authentication of > the RPC server before a client transmits a secret, such as a block access > token. Similarly, SSL guarantees authentication of the > HTTP server before a client transmits a secret, such as a delegation token. > For the 2nd case, HTTPS_ONLY means all the traffic between REST client/server > will be encrypted. However, the logic to check only if SASL property resolver > is configured does not mean server requires an encrypted RPC. > This ticket is open to further check and ensure datanode SASL property > resolver has a QoP that includes auth-conf(PRIVACY). Note that the SASL QoP > (Quality of Protection) negotiation may drop RPC protection level from > auth-conf(PRIVACY) to auth-int(integrity) or auth(authentication) only, which > should be fine by design. > > cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13081) Datanode#checkSecureConfig should check HTTPS and SASL encryption
[ https://issues.apache.org/jira/browse/HDFS-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13081: -- Attachment: HDFS-13081.001.patch > Datanode#checkSecureConfig should check HTTPS and SASL encryption > - > > Key: HDFS-13081 > URL: https://issues.apache.org/jira/browse/HDFS-13081 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 3.0.0 >Reporter: Xiaoyu Yao >Assignee: Ajay Kumar >Priority: Major > Attachments: HDFS-13081.000.patch, HDFS-13081.001.patch > > > Datanode#checkSecureConfig currently check the following to determine if > secure datanode is enabled. > # The server has bound to privileged ports for RPC and HTTP via > SecureDataNodeStarter. > # The configuration enables SASL on DataTransferProtocol and HTTPS (no plain > HTTP) for the HTTP server. The SASL handshake guarantees authentication of > the RPC server before a client transmits a secret, such as a block access > token. Similarly, SSL guarantees authentication of the > HTTP server before a client transmits a secret, such as a delegation token. > For the 2nd case, HTTPS_ONLY means all the traffic between REST client/server > will be encrypted. However, the logic to check only if SASL property resolver > is configured does not mean server requires an encrypted RPC. > This ticket is open to further check and ensure datanode SASL property > resolver has a QoP that includes auth-conf(PRIVACY). Note that the SASL QoP > (Quality of Protection) negotiation may drop RPC protection level from > auth-conf(PRIVACY) to auth-int(integrity) or auth(authentication) only, which > should be fine by design. > > cc: [~cnauroth] , [~daryn], [~jnpandey] for additional feedback. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364804#comment-16364804 ] genericqa commented on HDFS-12452: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 18 unchanged - 0 fixed = 19 total (was 18) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 54s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}161m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData | | | hadoop.hdfs.tools.TestDFSAdminWithHA | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure020 | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12452 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910616/HDFS-12452.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7b5cf1dfe02b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f20f43 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/23069/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-13134) Ozone: Format open containers on datanode restart
[ https://issues.apache.org/jira/browse/HDFS-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364777#comment-16364777 ] Anu Engineer commented on HDFS-13134: - [~ljain] The change looks good, I am going to take a week to think through this and review with the rest of the team before I commit this. The reason is that command provides the ability to destroy data ( and rightfully so) [~msingh], [~nandakumar131],[~elek] and [~xyao] Please take a look at the patch when you have a chance. I want to commit this only I get perspectives from others too. Thanks for your time and consideration. > Ozone: Format open containers on datanode restart > - > > Key: HDFS-13134 > URL: https://issues.apache.org/jira/browse/HDFS-13134 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-13134-HDFS-7240.001.patch > > > Once a datanode is restarted its open containers should be formatted. Only > the open containers whose pipeline has a replication factor of three will > need to be formatted. The format command is sent by SCM to the datanode after > the corresponding containers have been successfully replicated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12983) Block Storage: provide docker-compose file for cblock clusters
[ https://issues.apache.org/jira/browse/HDFS-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364773#comment-16364773 ] Anu Engineer commented on HDFS-12983: - [~elek] I am +1 on this patch. However, we have 50070, just wondering if the port numbers are correct since this Hadoop 3.0 default ports might be different. Can you please take a quick look at the port numbers? If you are sure they are functional please let me know and I will commit this patch. > Block Storage: provide docker-compose file for cblock clusters > -- > > Key: HDFS-12983 > URL: https://issues.apache.org/jira/browse/HDFS-12983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: ozone >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Attachments: HDFS-12983-HDFS-7240.001.patch, > HDFS-12983-HDFS-7240.002.patch, HDFS-12983-HDFS-7240.003.patch > > > Since HDFS-12469 we have a docker compose file at dev-support/compose/ozone > which makes it easy to start local ozone clusers with multiple datanodes. > In this patch I propose similar config file for the cblock/iscsi servers > (jscsi + cblock + scm + namenode + datanode) to make it easier to check the > latest state. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly
[ https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364764#comment-16364764 ] Daryn Sharp commented on HDFS-13040: Let me catch up. > Kerberized inotify client fails despite kinit properly > -- > > Key: HDFS-13040 > URL: https://issues.apache.org/jira/browse/HDFS-13040 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, > HDFS-13040.03.patch, HDFS-13040.half.test.patch, > TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java > > > This issue is similar to HDFS-10799. > HDFS-10799 turned out to be a client side issue where client is responsible > for renewing kerberos ticket actively. > However we found in a slightly setup even if client has valid Kerberos > credentials, inotify still fails. > Suppose client uses principal h...@example.com, > namenode 1 uses server principal hdfs/nn1.example@example.com > namenode 2 uses server principal hdfs/nn2.example@example.com > *After Namenodes starts for longer than kerberos ticket lifetime*, the client > fails with the following error: > {noformat} > 18/01/19 11:23:02 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) > cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We > encountered an error reading > https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3, > > https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 8683, but we thought we could read up to transaction > 8684. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210) > {noformat} > Typically if NameNode has an expired Kerberos ticket, the error handling for > the typical edit log tailing would let NameNode to relogin with its own > Kerberos principal. However, when inotify uses the same code path to retrieve > edits, since the current user is the inotify client's principal, unless > client uses the same principal as the NameNode, NameNode can't do it on > behalf of the client. > Therefore, a more appropriate approach is to use proxy user so that NameNode > can retrieving edits on behalf of the client. > I will attach a patch to fix it. This patch has been verified to work for a > CDH5.10.2 cluster, however it seems impossible to craft a unit test for this > fix because the way Hadoop UGI handles Kerberos credentials (I can't have a > single process that logins as two Kerberos principals simultaneously and let > them establish connection) > A possible workaround is for the inotify client to use the active NameNode's > server principal. However, that's not going to work when there's a namenode > failover, because then the client's principal will not be consistent with the >
[jira] [Commented] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM
[ https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364733#comment-16364733 ] Anu Engineer commented on HDFS-11699: - [~elek] Thanks for the code review comments. The patch v2 addresses all comments and fixes the checkStyle issues too. Details below: bq. In ContainerMapping.java/processContainerReport: I don't undestand the comments, but my impression is that the two method should be swaped: You are absolutely right; thanks for catching that. bq. According to my understanding this code will send a close command even if the container is in CLOSED state. IMHO it should be sent only if the container in OPEN or CLOSING state. Fixed. bq. It's not clear for me how the CLOSED state will be achieved, but maybe it's a task of a different jira. Correct, the client will post that message. I think we have a JIRA in progress for that already. bq. javadoc of ContainerMapping.shouldClose is misleading. It returns false if the container is closed Fixed. > Ozone:SCM: Add support for close containers in SCM > -- > > Key: HDFS-11699 > URL: https://issues.apache.org/jira/browse/HDFS-11699 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Major > Attachments: HDFS-11699-HDFS-7240.001.patch, > HDFS-11699-HDFS-7240.002.patch > > > Add support for closed containers in SCM. When a container is closed, SCM > needs to make a set of decisions like which pool and which machines are > expected to have this container. SCM also needs to issue a copyContainer > command to the target datanodes so that these nodes can replicate data from > the original locations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11699) Ozone:SCM: Add support for close containers in SCM
[ https://issues.apache.org/jira/browse/HDFS-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-11699: Attachment: HDFS-11699-HDFS-7240.002.patch > Ozone:SCM: Add support for close containers in SCM > -- > > Key: HDFS-11699 > URL: https://issues.apache.org/jira/browse/HDFS-11699 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Anu Engineer >Priority: Major > Attachments: HDFS-11699-HDFS-7240.001.patch, > HDFS-11699-HDFS-7240.002.patch > > > Add support for closed containers in SCM. When a container is closed, SCM > needs to make a set of decisions like which pool and which machines are > expected to have this container. SCM also needs to issue a copyContainer > command to the target datanodes so that these nodes can replicate data from > the original locations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364731#comment-16364731 ] Kihwal Lee commented on HDFS-12749: --- I see. The old registration looks identical to the new one, so NN still accepts it. About catching IOException: if the registration fails with a RemoteException that is not RetriableException, the actor may need to stop instead of retrying. Also, if we choose to blank something out before trying to re-register to re-trigger registration, we should avoid hitting something like HDFS-8995. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); > sleepAndLogInterrupts(1000, "connecting to server"); > } catch(SocketTimeoutException e) { // namenode is busy > LOG.info("Problem connecting to server: " + nnAddr); > sleepAndLogInterrupts(1000, "connecting to server"); > } > } > > LOG.info("Block pool " + this + " successfully registered with NN"); > bpos.registrationSucceeded(this, bpRegistration); > // random short delay
[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check
[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364718#comment-16364718 ] genericqa commented on HDFS-13136: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 285 unchanged - 1 fixed = 285 total (was 286) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 14s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.TestHDFSFileSystemContract | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13136 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910611/HDFS-13136.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 76750bd05515 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f20dc0d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23068/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23068/testReport/ | | Max. process+thread
[jira] [Commented] (HDFS-13114) CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path
[ https://issues.apache.org/jira/browse/HDFS-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364719#comment-16364719 ] Hanisha Koneru commented on HDFS-13114: --- Thanks for the review, [~jojochuang]. I have created HDFS-13148 to add unit tests for EZ with Federation. Once we have that setup, it would be easy to add a test for this change. I will get back to this once HDFS-13148 is resolved. > CryptoAdmin#ReencryptZoneCommand should resolve Namespace info from path > > > Key: HDFS-13114 > URL: https://issues.apache.org/jira/browse/HDFS-13114 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13114.001.patch > > > The {{crypto -reencryptZone -path }} command takes in a path > argument. But when creating {{HdfsAdmin}} object, it takes the defaultFs > instead of resolving from the path. This causes the following exception if > the authority component in path does not match the authority of default Fs. > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://mycluster-node-1:8020/zone1 > IllegalArgumentException: Wrong FS: hdfs://mycluster-node-1:8020/zone1, > expected: hdfs://ns1{code} > {code:java} > $ hdfs crypto -reencryptZone -start -path hdfs://ns2/zone2 > IllegalArgumentException: Wrong FS: hdfs://ns2/zone2, expected: > hdfs://ns1{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13148) Unit test for KMS and EZ with Federation
Hanisha Koneru created HDFS-13148: - Summary: Unit test for KMS and EZ with Federation Key: HDFS-13148 URL: https://issues.apache.org/jira/browse/HDFS-13148 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hanisha Koneru Assignee: Hanisha Koneru It would be good to have some unit tests for testing KMS and EZ on a federated cluster. We can start with basic EZ operations. For example, create EZs on two namespaces with different keys using one KMS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364595#comment-16364595 ] Gabor Bota commented on HDFS-12452: --- Thanks for the patch Xiaoyu! I just found when going through the patch that you have a typo in INTERNAL_DFS_BLOCK_SCANNER_SHUTDOWN_WAIT_INTERVAL="internal.dfs.block.scanner.shutdown.wait.intervaL" I think the string should be internal.dfs.block.scanner.shutdown.wait.interval with lowercase l. > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364677#comment-16364677 ] Xiaoyu Yao commented on HDFS-12452: --- good catch. Upload v2 patch that fixed the typo. > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-12452: -- Attachment: HDFS-12452.002.patch > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch, HDFS-12452.002.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads
[ https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364636#comment-16364636 ] Erik Krogen commented on HDFS-12345: Just want to ping watchers that the description has been updated now that Dynamometer is fully open source. We are interested in hearing feedback about the possibility of including Dynamometer into Hadoop itself, i.e. as part of tools. > Scale testing HDFS NameNode with real metadata and workloads > > > Key: HDFS-12345 > URL: https://issues.apache.org/jira/browse/HDFS-12345 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode, test >Reporter: Zhe Zhang >Assignee: Erik Krogen >Priority: Major > > Dynamometer has now been open sourced on our [GitHub > page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog > post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum]. > To encourage getting the tool into the open for others to use as quickly as > possible, we went through our standard open sourcing process of releasing on > GitHub. However we are interested in the possibility of donating this to > Apache as part of Hadoop itself and would appreciate feedback on whether or > not this is something that would be supported by the community. > Also of note, previous [discussions on the dev mail > lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads
[ https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364655#comment-16364655 ] Íñigo Goiri commented on HDFS-12345: I'm personally interested on pushing HDFS as a MapReduce job into Hadoop itself. I've gone through that part of the code extensively and I'd like a couple things tweaked here and there but I like the setup. We already have tools like GridMix so I think it makes sense to add the workload replayer too. I internally have a couple MapReduce jobs that do something pretty similar so I could also converge into this. To summarize, my opinion is that we should go over on how to split the work but I think this most of Dynanometer (if not all) should go into Hadoop. > Scale testing HDFS NameNode with real metadata and workloads > > > Key: HDFS-12345 > URL: https://issues.apache.org/jira/browse/HDFS-12345 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode, test >Reporter: Zhe Zhang >Assignee: Erik Krogen >Priority: Major > > Dynamometer has now been open sourced on our [GitHub > page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog > post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum]. > To encourage getting the tool into the open for others to use as quickly as > possible, we went through our standard open sourcing process of releasing on > GitHub. However we are interested in the possibility of donating this to > Apache as part of Hadoop itself and would appreciate feedback on whether or > not this is something that would be supported by the community. > Also of note, previous [discussions on the dev mail > lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads
[ https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364655#comment-16364655 ] Íñigo Goiri edited comment on HDFS-12345 at 2/14/18 7:32 PM: - I'm personally interested on pushing HDFS as a YARN job into Hadoop itself. I've gone through that part of the code extensively and I'd like a couple things tweaked here and there but I like the setup. We already have tools like GridMix so I think it makes sense to add the workload replayer too. I internally have a couple MapReduce jobs that do something pretty similar so I could also converge into this. To summarize, my opinion is that we should go over on how to split the work but I think this most of Dynanometer (if not all) should go into Hadoop. was (Author: elgoiri): I'm personally interested on pushing HDFS as a MapReduce job into Hadoop itself. I've gone through that part of the code extensively and I'd like a couple things tweaked here and there but I like the setup. We already have tools like GridMix so I think it makes sense to add the workload replayer too. I internally have a couple MapReduce jobs that do something pretty similar so I could also converge into this. To summarize, my opinion is that we should go over on how to split the work but I think this most of Dynanometer (if not all) should go into Hadoop. > Scale testing HDFS NameNode with real metadata and workloads > > > Key: HDFS-12345 > URL: https://issues.apache.org/jira/browse/HDFS-12345 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode, test >Reporter: Zhe Zhang >Assignee: Erik Krogen >Priority: Major > > Dynamometer has now been open sourced on our [GitHub > page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog > post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum]. > To encourage getting the tool into the open for others to use as quickly as > possible, we went through our standard open sourcing process of releasing on > GitHub. However we are interested in the possibility of donating this to > Apache as part of Hadoop itself and would appreciate feedback on whether or > not this is something that would be supported by the community. > Also of note, previous [discussions on the dev mail > lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12345) Scale testing HDFS NameNode with real metadata and workloads
[ https://issues.apache.org/jira/browse/HDFS-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364651#comment-16364651 ] Anu Engineer commented on HDFS-12345: - [~xkrogen] I have looked at the source and read the excellent blog that you wrote. Thank you. I am very impressed by this tool and I am +1 for making this tool to be part of Hadoop. The most important reason is that we will be able to make sure that this tool stays current with the code base. It might lead to adding some simple "sanity check" type of unit tests though, so that we can detect it we have accidently broken this tool. > Scale testing HDFS NameNode with real metadata and workloads > > > Key: HDFS-12345 > URL: https://issues.apache.org/jira/browse/HDFS-12345 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode, test >Reporter: Zhe Zhang >Assignee: Erik Krogen >Priority: Major > > Dynamometer has now been open sourced on our [GitHub > page|https://github.com/linkedin/dynamometer]. Read more at our [recent blog > post|https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum]. > To encourage getting the tool into the open for others to use as quickly as > possible, we went through our standard open sourcing process of releasing on > GitHub. However we are interested in the possibility of donating this to > Apache as part of Hadoop itself and would appreciate feedback on whether or > not this is something that would be supported by the community. > Also of note, previous [discussions on the dev mail > lists|http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201707.mbox/%3c98fceffa-faff-4cf1-a14d-4faab6567...@gmail.com%3e] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HDFS-11187: -- Status: Open (was: Patch Available) > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN
[ https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364637#comment-16364637 ] Daryn Sharp commented on HDFS-8693: --- Caveat is I believe this will cause a small memory leak of the rpc call's subject. When you invoke doAs, it essentially pushes a new context onto the access control stack. After this patch, the stack is likely: loginUserSubject, rpcCallSubject, anotherLoginUserSubject. If refreshing the NNs creates a new offer thread it inherits the access control context so the rpc subject will leak until the thread exits. Unlikely to be an issue though unless it's constantly refreshed for new NNs. > refreshNamenodes does not support adding a new standby to a running DN > -- > > Key: HDFS-8693 > URL: https://issues.apache.org/jira/browse/HDFS-8693 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 2.6.0 >Reporter: Jian Fang >Assignee: Ajith S >Priority: Critical > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: HDFS-8693-03-Addendum-branch-2.patch, > HDFS-8693-03-addendum.patch, HDFS-8693.02.patch, HDFS-8693.03.patch, > HDFS-8693.1.patch > > > I tried to run the following command on a Hadoop 2.6.0 cluster with HA > support > $ hdfs dfsadmin -refreshNamenodes datanode-host:port > to refresh name nodes on data nodes after I replaced one name node with a new > one so that I don't need to restart the data nodes. However, I got the > following error: > refreshNamenodes: HA does not currently support adding a new standby to a > running DN. Please do a rolling restart of DNs to reconfigure the list of NNs. > I checked the 2.6.0 code and the error was thrown by the following code > snippet, which led me to this JIRA. > void refreshNNList(ArrayList addrs) throws IOException { > Set oldAddrs = Sets.newHashSet(); > for (BPServiceActor actor : bpServices) > { oldAddrs.add(actor.getNNSocketAddress()); } > Set newAddrs = Sets.newHashSet(addrs); > if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) > { // Keep things simple for now -- we can implement this at a later date. > throw new IOException( "HA does not currently support adding a new standby to > a running DN. " + "Please do a rolling restart of DNs to reconfigure the list > of NNs."); } > } > Looks like this the refreshNameNodes command is an uncompleted feature. > Unfortunately, the new name node on a replacement is critical for auto > provisioning a hadoop cluster with HDFS HA support. Without this support, the > HA feature could not really be used. I also observed that the new standby > name node on the replacement instance could stuck in safe mode because no > data nodes check in with it. Even with a rolling restart, it may take quite > some time to restart all data nodes if we have a big cluster, for example, > with 4000 data nodes, let alone restarting DN is way too intrusive and it is > not a preferable operation in production. It also increases the chance for a > double failure because the standby name node is not really ready for a > failover in the case that the current active name node fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12985) NameNode crashes during restart after an OpenForWrite file present in the Snapshot got deleted
[ https://issues.apache.org/jira/browse/HDFS-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364626#comment-16364626 ] Arpit Agarwal commented on HDFS-12985: -- Hi [~manojg], I was looking at the test case to understand the problem better. The test passes for me even without the fix in INodeFile (I ran the new test against git hash 2ee0d64aceed876f57f09eb9efe1872b6de98d2e). Do you see the test fail without the fix? > NameNode crashes during restart after an OpenForWrite file present in the > Snapshot got deleted > -- > > Key: HDFS-12985 > URL: https://issues.apache.org/jira/browse/HDFS-12985 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Major > Fix For: 3.1.0, 2.10.0, 3.0.1 > > Attachments: HDFS-12985.01.patch > > > NameNode crashes repeatedly with NPE at the startup when trying to find the > total number of under construction blocks. This crash happens after an open > file, which was also part of a snapshot gets deleted along with the snapshot. > {noformat} > Failed to start namenode. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.getNumUnderConstructionBlocks(LeaseManager.java:146) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getCompleteBlocksTotal(FSNamesystem.java:6537) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startCommonServices(FSNamesystem.java:1232) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startCommonServices(NameNode.java:706) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:692) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-12452: -- Status: Patch Available (was: Open) > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364595#comment-16364595 ] Gabor Bota edited comment on HDFS-12452 at 2/14/18 6:48 PM: Thanks for the patch Xiaoyu! I just found when going through the patch that you have a typo when defining INTERNAL_DFS_BLOCK_SCANNER_SHUTDOWN_WAIT_INTERVAL = "internal.dfs.block.scanner.shutdown.wait.intervaL" I think the string should be internal.dfs.block.scanner.shutdown.wait.interval with lowercase l. was (Author: gabor.bota): Thanks for the patch Xiaoyu! I just found when going through the patch that you have a typo in INTERNAL_DFS_BLOCK_SCANNER_SHUTDOWN_WAIT_INTERVAL="internal.dfs.block.scanner.shutdown.wait.intervaL" I think the string should be internal.dfs.block.scanner.shutdown.wait.interval with lowercase l. > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-12452: -- Attachment: HDFS-12452.001.patch > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > Attachments: HDFS-12452.001.patch > > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13147) Support -c argument for DFS command head and tail
[ https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364530#comment-16364530 ] genericqa commented on HDFS-13147: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} root: The patch generated 0 new + 175 unchanged - 2 fixed = 175 total (was 177) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 50s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 0s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}219m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.log.TestLogLevel | | | hadoop.http.TestHttpServerWithSpengo | | | hadoop.security.token.delegation.web.TestWebDelegationToken | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13147 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910586/HDFS-13147.003.patch | | Optional Tests | asflicense compile javac
[jira] [Assigned] (HDFS-12452) TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs
[ https://issues.apache.org/jira/browse/HDFS-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-12452: - Assignee: Xiaoyu Yao > TestDataNodeVolumeFailureReporting fails in trunk Jenkins runs > -- > > Key: HDFS-12452 > URL: https://issues.apache.org/jira/browse/HDFS-12452 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao >Priority: Critical > Labels: flaky-test > > TestDataNodeVolumeFailureReporting#testSuccessiveVolumeFailures fails > frequently in Jenkins runs but it passes locally on my dev machine. > e.g. > https://builds.apache.org/job/PreCommit-HDFS-Build/21134/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailureReporting/testSuccessiveVolumeFailures/ > {code} > Error Message > test timed out after 12 milliseconds > Stacktrace > java.lang.Exception: test timed out after 12 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:761) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:189) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364520#comment-16364520 ] Erik Krogen edited comment on HDFS-12749 at 2/14/18 6:00 PM: - [~kihwal], IIUC, the problem is that the NN correctly processed the registration, but the DN timed out before receiving the response. Since from NN point of view the registration was complete, it did not send another DNA_REGISTER command. was (Author: xkrogen): [~kihwal], IIUC, the problem is that the NN correctly processed the registration, but the DN got an IOException during processing. Since from NN point of view the registration was complete, it did not send another DNA_REGISTER command. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); > sleepAndLogInterrupts(1000, "connecting to server"); > } catch(SocketTimeoutException e) { // namenode is busy > LOG.info("Problem connecting to server: " + nnAddr); > sleepAndLogInterrupts(1000, "connecting to server"); > } > } >
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364520#comment-16364520 ] Erik Krogen commented on HDFS-12749: [~kihwal], IIUC, the problem is that the NN correctly processed the registration, but the DN got an IOException during processing. Since from NN point of view the registration was complete, it did not send another DNA_REGISTER command. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); > sleepAndLogInterrupts(1000, "connecting to server"); > } catch(SocketTimeoutException e) { // namenode is busy > LOG.info("Problem connecting to server: " + nnAddr); > sleepAndLogInterrupts(1000, "connecting to server"); > } > } > > LOG.info("Block pool " + this + " successfully registered with NN"); > bpos.registrationSucceeded(this, bpRegistration); > // random short delay - helps scatter the BR from all DNs > scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay); > } > {code} > But NameNode has processed
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364508#comment-16364508 ] Kihwal Lee commented on HDFS-12749: --- What was the command that failed when you saw "Error processing datanode Command"? If the processing of DNA_REGISTER blew up with an IOException, the DN would have gotten the command again in the next hearbeat and retried. The block token secret is updated (DNA_ACCESSKEYUPDATE) in the first heartbeat after registration. There can be other commands along with it, but failure of processing a command does not abort the whole processing, so that shouldn't matter. We need to understand where exactly it went wrong. More detailed exception/stack traces, thread name (so that we can tell which actor thread it came from), timing, etc. will be helpful. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); > sleepAndLogInterrupts(1000, "connecting to server"); > } catch(SocketTimeoutException e) { // namenode is busy > LOG.info("Problem connecting
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364497#comment-16364497 ] Íñigo Goiri commented on HDFS-13119: [~linyiqun], yep the retry logic kind of break the flow. Anyway, I think we should try to refactor that part of code and avoid repeating this: {code:java} if (this.rpcMonitor != null) { this.rpcMonitor.proxyOpRetries(); } return invoke(nsId, ++retryCount, method, obj, params); {code} It's minor but I think that we should try to make an effort to keep this function as easy to read as possible. What about extending {{shouldRetry()}} and check for unavailable there? We already use the FAIL case there but maybe we can just throw the exception there. > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check
[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364469#comment-16364469 ] Xiaoyu Yao edited comment on HDFS-13136 at 2/14/18 5:29 PM: Thanks [~szetszwo] for the review. Update patch v2 that fixed the unit test failures in {code} hadoop.hdfs.server.namenode.TestAuditLogger and hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands {code} Now that the getPermissionChecker() is moved out of the FSN lock, the test mocks are updated to reach deeper to get the expected exception and the audit log entry. The delta from v1 to v2 is the two unit test changes above. The other two failures cannot repro. was (Author: xyao): Thanks [~szetszwo] for the review. Update patch v2 that fixed the unit test failures in hadoop.hdfs.server.namenode.TestAuditLogger and hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands Now that the getPermission checker is moved out of the FSN lock, the test mocks are updated to reach deeper to get the expected exception and the audit log entry. The other two failures cannot repro. > Avoid taking FSN lock while doing group member lookup for FSD permission check > -- > > Key: HDFS-13136 > URL: https://issues.apache.org/jira/browse/HDFS-13136 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch > > > Namenode has FSN lock and FSD lock. Most of the namenode operations need to > take FSN lock first and then FSD lock. The permission check is done via > FSPermissionChecker at FSD layer assuming FSN lock is taken. > The FSPermissionChecker constructor invokes callerUgi.getGroups() that can > take seconds sometimes. There are external cache scheme such SSSD and > internal cache scheme for group lookup. However, the delay could still occur > during cache refresh, which causes severe FSN lock contentions and > unresponsive namenode issues. > Checking the current code, we found that getBlockLocations(..) did it right > but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. > This ticket is open to ensure the group lookup for permission checker is > outside the FSN lock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check
[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364469#comment-16364469 ] Xiaoyu Yao commented on HDFS-13136: --- Thanks [~szetszwo] for the review. Update patch v2 that fixed the unit test failures in hadoop.hdfs.server.namenode.TestAuditLogger and hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands Now that the getPermission checker is moved out of the FSN lock, the test mocks are updated to reach deeper to get the expected exception and the audit log entry. The other two failures cannot repro. > Avoid taking FSN lock while doing group member lookup for FSD permission check > -- > > Key: HDFS-13136 > URL: https://issues.apache.org/jira/browse/HDFS-13136 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch > > > Namenode has FSN lock and FSD lock. Most of the namenode operations need to > take FSN lock first and then FSD lock. The permission check is done via > FSPermissionChecker at FSD layer assuming FSN lock is taken. > The FSPermissionChecker constructor invokes callerUgi.getGroups() that can > take seconds sometimes. There are external cache scheme such SSSD and > internal cache scheme for group lookup. However, the delay could still occur > during cache refresh, which causes severe FSN lock contentions and > unresponsive namenode issues. > Checking the current code, we found that getBlockLocations(..) did it right > but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. > This ticket is open to ensure the group lookup for permission checker is > outside the FSN lock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check
[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-13136: -- Attachment: HDFS-13136.002.patch > Avoid taking FSN lock while doing group member lookup for FSD permission check > -- > > Key: HDFS-13136 > URL: https://issues.apache.org/jira/browse/HDFS-13136 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch > > > Namenode has FSN lock and FSD lock. Most of the namenode operations need to > take FSN lock first and then FSD lock. The permission check is done via > FSPermissionChecker at FSD layer assuming FSN lock is taken. > The FSPermissionChecker constructor invokes callerUgi.getGroups() that can > take seconds sometimes. There are external cache scheme such SSSD and > internal cache scheme for group lookup. However, the delay could still occur > during cache refresh, which causes severe FSN lock contentions and > unresponsive namenode issues. > Checking the current code, we found that getBlockLocations(..) did it right > but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. > This ticket is open to ensure the group lookup for permission checker is > outside the FSN lock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364356#comment-16364356 ] Erik Krogen edited comment on HDFS-12749 at 2/14/18 4:38 PM: - Hey [~hexiaoqiao], I have to admit I'm not familiar with this portion of the codebase. But it seems that the underlying issue is still a {{SocketTimeoutException}}, which we are trying to catch. Is this just an issue of an exception being over-wrapped? If it is over-wrapped in this case, is there any case where it won't be, i.e. is that catch statement actually useful as-is? {{NetUtils.wrapException}} will properly maintain the class of a {{SocketTimeoutException}}; I dug around a little and it was not immediately obvious to me why it received an {{IOException}} wrapped around a timeout rather than just a timeout. It does seem that probably we want to catch all {{IOException}} here, but like I said I'm not familiar with this area and don't know if there is a good reason not to. Maybe re-ping [~kihwal] - I definitely agree that with proper tuning this issue shouldn't happen, but it does seem that the original complaint is valid and that this could be more robust. was (Author: xkrogen): Hey [~hexiaoqiao], I have to admit I'm not familiar with this portion of the codebase. But it seems that the underlying issue is still a {{SocketTimeoutException}}, which we are trying to catch. Is this just an issue of an exception being over-wrapped? If it is over-wrapped in this case, is there any case where it won't be, i.e. is that catch statement actually useful as-is? It does seem that probably we want to catch all {{IOException}} here, but like I said I'm not familiar with this area and don't know if there is a good reason not to. Maybe re-ping [~kihwal] - I definitely agree that with proper tuning this issue shouldn't happen, but it does seem that the original complaint is valid and that this could be more robust. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize
[jira] [Commented] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364361#comment-16364361 ] Steve Loughran commented on HDFS-13113: --- this is in HDFS 3.1; leaving the JIRA open in case we want to add a patch which applies to branch-3.0; there's conflict in the NFS module {code} both modified: hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java both modified: hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java both modified: hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java {code} > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HDFS-13113 > URL: https://issues.apache.org/jira/browse/HDFS-13113 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, nfs >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.05.patch, HADOOP-10571.07.patch > > > FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log > statements, including some in Datanode and elsewhere. > I'm provisionally +1 on that, but want to run it on the standalone tests > (Yetus has already done them), and give the HDFS developers warning of a > change which is going to touch their codebase. > If anyone doesn't want the logging improvements, now is your chance to say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364356#comment-16364356 ] Erik Krogen commented on HDFS-12749: Hey [~hexiaoqiao], I have to admit I'm not familiar with this portion of the codebase. But it seems that the underlying issue is still a {{SocketTimeoutException}}, which we are trying to catch. Is this just an issue of an exception being over-wrapped? If it is over-wrapped in this case, is there any case where it won't be, i.e. is that catch statement actually useful as-is? It does seem that probably we want to catch all {{IOException}} here, but like I said I'm not familiar with this area and don't know if there is a good reason not to. Maybe re-ping [~kihwal] - I definitely agree that with proper tuning this issue shouldn't happen, but it does seem that the original complaint is valid and that this could be more robust. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); > sleepAndLogInterrupts(1000, "connecting to server"); > }
[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs
[ https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364327#comment-16364327 ] genericqa commented on HDFS-13142: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 249 unchanged - 0 fixed = 251 total (was 249) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}164m 39s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}218m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.TestBlockStoragePolicy | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.TestMultiThreadedHflush | | | hadoop.hdfs.server.namenode.TestLargeDirectoryDelete | | | hadoop.hdfs.TestPread | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.TestRestartDFS | | | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA | | | hadoop.hdfs.server.namenode.ha.TestNNHealthCheck | | | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication | | | hadoop.hdfs.server.namenode.ha.TestHAMetrics | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.namenode.ha.TestEditLogsDuringFailover | | | hadoop.hdfs.server.namenode.TestGetContentSummaryWithPermission | | | hadoop.hdfs.server.namenode.ha.TestXAttrsWithHA | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.TestFileAppend2 | | |
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364305#comment-16364305 ] genericqa commented on HDFS-12749: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-12749 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12749 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895403/HDFS-12749.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23067/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); >
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364280#comment-16364280 ] He Xiaoqiao commented on HDFS-12749: ping [~xkrogen] I also had seen this problems in our big production cluster, and I think this is a blocked issue. The issue was original reported by my colleague [~tanyuxin] and we applied this patch to our production cluster based branch-2.7, the problems as mentioned above disappears. I think the description of this ticket may lead to little ambiguity. I would like to offer more information, a) {{BPServiceActor#register}} only catch exception {{EOFException}} and {{SocketTimeoutException}} (both are subclass of IOException) when register to NameNode. b) when the request pass to under {{RPC}} layer, {{Client#call}} (line 1448) may throw many type exceptions, such as {{InterruptedIOException}}, {{IOException}}, etc. of course for different exception reasons. c) Load of NameNode may very high during restarting process, especially in big cluster. when datanode reregister to namenode at this time, {{RPC#Client}} may throw IOException (not subclass of IOException) to {{BPServiceActor#register}} as [~tanyuxin] describe above. d) The IOException will be catch by {{BPServiceActor#processCommand}} and print warn log {{`WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing datanode Command`}} but not reinit context of {{BlockPool}} when reregister and continue run, so {{BlockReport}} will not be scheduled immediately (when DataNode send {{BlockReport}} to restarting NameNode based on the last {{BlockReport}} time.) e) Actually, There are other subsequent problems besides NOT schedule {{BlockReport}} immediately. Such as block token secret keys not update correctly and client can not read {{Blocks}} from this DataNode even if hold the correct BlockToken. I have reviewed patch and one minor suggestion, a) please rebase an active branch (such as branch-2.7) and offer new patch; b) it will be better if add new UnitTest for this fix. If I am wrong please correct me. [~tanyuxin] > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: TanYuxin >Priority: Major > Attachments: HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364194#comment-16364194 ] genericqa commented on HDFS-13119: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 19m 50s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 0 unchanged - 433 fixed = 0 total (was 433) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 0m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 11s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 10s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 10s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13119 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910583/HDFS-13119.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 8a9d3b12e662 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 60971b8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/23065/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt | | mvnsite | https://builds.apache.org/job/PreCommit-HDFS-Build/23065/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt | | findbugs |
[jira] [Updated] (HDFS-13147) Support -c argument for DFS command head and tail
[ https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianfei Jiang updated HDFS-13147: - Status: Patch Available (was: In Progress) Patch 003: Fix the testcase failure which is related. > Support -c argument for DFS command head and tail > - > > Key: HDFS-13147 > URL: https://issues.apache.org/jira/browse/HDFS-13147 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jianfei Jiang >Assignee: Jianfei Jiang >Priority: Minor > Attachments: HDFS-13147.001.patch, HDFS-13147.002.patch, > HDFS-13147.003.patch > > > The offset of command {{head}} and {{tail}} is hard coded as 1024 bytes. Goal > to improve it that the offset can be specified by user like Linux commands. > Then the commands will be more flexible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13147) Support -c argument for DFS command head and tail
[ https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianfei Jiang updated HDFS-13147: - Attachment: HDFS-13147.003.patch > Support -c argument for DFS command head and tail > - > > Key: HDFS-13147 > URL: https://issues.apache.org/jira/browse/HDFS-13147 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jianfei Jiang >Assignee: Jianfei Jiang >Priority: Minor > Attachments: HDFS-13147.001.patch, HDFS-13147.002.patch, > HDFS-13147.003.patch > > > The offset of command {{head}} and {{tail}} is hard coded as 1024 bytes. Goal > to improve it that the offset can be specified by user like Linux commands. > Then the commands will be more flexible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13147) Support -c argument for DFS command head and tail
[ https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianfei Jiang updated HDFS-13147: - Status: In Progress (was: Patch Available) > Support -c argument for DFS command head and tail > - > > Key: HDFS-13147 > URL: https://issues.apache.org/jira/browse/HDFS-13147 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jianfei Jiang >Assignee: Jianfei Jiang >Priority: Minor > Attachments: HDFS-13147.001.patch, HDFS-13147.002.patch > > > The offset of command {{head}} and {{tail}} is hard coded as 1024 bytes. Goal > to improve it that the offset can be specified by user like Linux commands. > Then the commands will be more flexible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364058#comment-16364058 ] Yiqun Lin commented on HDFS-13119: -- Thanks for the review, [~elgoiri]. {quote}Otherwise, we could just do: {noformat} if (isClusterUnAvailable(nsId) && retryCount > 0) { throw new IOException("No namenode available under nameservice " + nsId, ioe); } {noformat} Then, the default logic takes care of the first retry. {quote} Actually the default logic won't takes care of the first retry. Here we use the retry policy {{FailoverOnNetworkExceptionRetry}}, it will firstly jump into logic of {{RetryDecision.FAILOVER_AND_RETRY}} and throw {{StandbyException}}. In the failover rerty, the retry count is passing as 0 again. > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364058#comment-16364058 ] Yiqun Lin edited comment on HDFS-13119 at 2/14/18 1:51 PM: --- Thanks for the review, [~elgoiri]. {quote}Otherwise, we could just do: {noformat} if (isClusterUnAvailable(nsId) && retryCount > 0) { throw new IOException("No namenode available under nameservice " + nsId, ioe); } {noformat} Then, the default logic takes care of the first retry. {quote} Actually the default logic won't takes care of the first retry. Here we use the retry policy {{FailoverOnNetworkExceptionRetry}}, it will firstly jump into logic of {{RetryDecision.FAILOVER_AND_RETRY}} and throw {{StandbyException}}. In the failover rerty, the retry count is passing as 0 again. Attach the new patch to fix some warnings. was (Author: linyiqun): Thanks for the review, [~elgoiri]. {quote}Otherwise, we could just do: {noformat} if (isClusterUnAvailable(nsId) && retryCount > 0) { throw new IOException("No namenode available under nameservice " + nsId, ioe); } {noformat} Then, the default logic takes care of the first retry. {quote} Actually the default logic won't takes care of the first retry. Here we use the retry policy {{FailoverOnNetworkExceptionRetry}}, it will firstly jump into logic of {{RetryDecision.FAILOVER_AND_RETRY}} and throw {{StandbyException}}. In the failover rerty, the retry count is passing as 0 again. > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters
[ https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13119: - Attachment: HDFS-13119.003.patch > RBF: Manage unavailable clusters > > > Key: HDFS-13119 > URL: https://issues.apache.org/jira/browse/HDFS-13119 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, > HDFS-13119.003.patch > > > When a federated cluster has one of the subcluster down, operations that run > in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC > connections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364040#comment-16364040 ] genericqa commented on HDFS-11187: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 5s{color} | {color:red} Docker failed to build yetus/hadoop:tp-17701. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11187 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910582/HDFS-11187-branch-2.004.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23064/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HDFS-11187: -- Status: Patch Available (was: Open) Added fix for FsDatasetImpl#append as requested. > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HDFS-11187: -- Status: Open (was: Patch Available) > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
[ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HDFS-11187: -- Attachment: HDFS-11187-branch-2.004.patch > Optimize disk access for last partial chunk checksum of Finalized replica > - > > Key: HDFS-11187 > URL: https://issues.apache.org/jira/browse/HDFS-11187 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Major > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11187-branch-2.001.patch, > HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, > HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, > HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch > > > The patch at HDFS-11160 ensures BlockSender reads the correct version of > metafile when there are concurrent writers. > However, the implementation is not optimal, because it must always read the > last partial chunk checksum from disk while holding FsDatasetImpl lock for > every reader. It is possible to optimize this by keeping an up-to-date > version of last partial checksum in-memory and reduce disk access. > I am separating the optimization into a new jira, because maintaining the > state of in-memory checksum requires a lot more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs
[ https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-13142: --- Attachment: HDFS-13142.002.patch > Define and Implement a DiifList Interface to store and manage SnapshotDiffs > --- > > Key: HDFS-13142 > URL: https://issues.apache.org/jira/browse/HDFS-13142 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch > > > The InodeDiffList class contains a generic List to store snapshotDiffs. The > generic List interface is bulky and to store and manage snapshotDiffs, we > need only a few specific methods. > This Jira proposes to define a new interface called DiffList interface which > will be used to store and manage snapshotDiffs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs
[ https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363869#comment-16363869 ] Shashikant Banerjee commented on HDFS-13142: Thanks [~szetszwo], for the review. patch v2 fixes the checkStyle warnings. > Define and Implement a DiifList Interface to store and manage SnapshotDiffs > --- > > Key: HDFS-13142 > URL: https://issues.apache.org/jira/browse/HDFS-13142 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13142.001.patch, HDFS-13142.002.patch > > > The InodeDiffList class contains a generic List to store snapshotDiffs. The > generic List interface is bulky and to store and manage snapshotDiffs, we > need only a few specific methods. > This Jira proposes to define a new interface called DiffList interface which > will be used to store and manage snapshotDiffs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
[ https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363850#comment-16363850 ] Steve Loughran commented on HDFS-13108: --- Where is the documentation of the URI ? h3. OzoneFileSystem L95 can the pattern be made static? If it is only used in initialize, it can be a local var L308. Use Precondition.checkArgument; include the URL in the error message built up L433. What if the path doesn't have a parent? h3. TestOzoneFileInterfaces L43. the imports are out of order w.r.t the Hadoop rules. Can it be fixed now, before any merge. L44. If you do a static import of Assert. no need to use \{{Assert.}} in front of every assertion. L98. Have init declare it throws Exception and no need to catch & rethrow URI syntax lossily. L127. Only need to use \{{this.}} prefix in the ctor L141 use IOUtils to close all of these, if for some reason you can't check for each one being null first. L150. Do a cast, not an assert, so that something meaninful is thrown. I have a strict "Veto all all patches where AssertTrue/AssertFalse don't include an error message" policy, and as I've been invited to comment, you've just encountered it. Sorry. As ClassCastException is meaningful, avoid the problem by not bothering with the assert. > Ozone: OzoneFileSystem: Simplified url schema for Ozone File System > --- > > Key: HDFS-13108 > URL: https://issues.apache.org/jira/browse/HDFS-13108 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Major > Attachments: HDFS-13108-HDFS-7240.001.patch, > HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch > > > A. Current state > > 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. > o3://datanode:9864/test/bucket1) > 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the > keys from the bucket1) > It works very well, but there are some limitations. > B. Problem one > The current code doesn't support fully qualified locations. For example 'dfs > -ls o3://datanode:9864/test/bucket1/dir1' is not working. > C.) Problem two > I tried to fix the previous problem, but it's not trivial. The biggest > problem is that there is a Path.makeQualified call which could transform > unqualified url to qualified url. This is part of the Path.java so it's > common for all the Hadoop file systems. > In the current implementations it qualifies an url with keeping the schema > (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use > the relative path as the end of the qualified url. For example: > makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will > return o3://datanode:9864/dir1/file which is obviously wrong (the good would > be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround > with using a custom makeQualified in the Ozone code and it worked from > command line but couldn't work with Spark which use the Hadoop api and the > original makeQualified path. > D.) Solution > We should support makeQualified calls, so we can use any path in the > defaultFS. > > I propose to use a simplified schema as o3://bucket.volume/ > This is similar to the s3a format where the pattern is s3a://bucket.region/ > We don't need to set the hostname of the datanode (or ksm in case of service > discovery) but it would be configurable with additional hadoop configuraion > values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 > (this is how the s3a works today, as I know). > We also need to define restrictions for the volume names (in our case it > should not include dot any more). > ps: some spark output > 2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor > spark.yarn.archive is set, falling back to uploading libraries under > SPARK_HOME. > 2018-02-03 18:43:05 INFO Client:54 - Uploading resource > file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip > -> > o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip > My default fs was o3://datanode:9864/test/bucket1, but spark qualified the > name of the home directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13136) Avoid taking FSN lock while doing group member lookup for FSD permission check
[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363842#comment-16363842 ] Tsz Wo Nicholas Sze commented on HDFS-13136: +1 the 001 patch looks good. Thanks for the fixing all the methods. The failed tests seem not related. Please take a look. > Avoid taking FSN lock while doing group member lookup for FSD permission check > -- > > Key: HDFS-13136 > URL: https://issues.apache.org/jira/browse/HDFS-13136 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-13136.001.patch > > > Namenode has FSN lock and FSD lock. Most of the namenode operations need to > take FSN lock first and then FSD lock. The permission check is done via > FSPermissionChecker at FSD layer assuming FSN lock is taken. > The FSPermissionChecker constructor invokes callerUgi.getGroups() that can > take seconds sometimes. There are external cache scheme such SSSD and > internal cache scheme for group lookup. However, the delay could still occur > during cache refresh, which causes severe FSN lock contentions and > unresponsive namenode issues. > Checking the current code, we found that getBlockLocations(..) did it right > but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. > This ticket is open to ensure the group lookup for permission checker is > outside the FSN lock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13142) Define and Implement a DiifList Interface to store and manage SnapshotDiffs
[ https://issues.apache.org/jira/browse/HDFS-13142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363825#comment-16363825 ] Tsz Wo Nicholas Sze commented on HDFS-13142: Thank [~shashikant], the patch looks good. There are a few checkstyle warnings. Could you fix them? > Define and Implement a DiifList Interface to store and manage SnapshotDiffs > --- > > Key: HDFS-13142 > URL: https://issues.apache.org/jira/browse/HDFS-13142 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13142.001.patch > > > The InodeDiffList class contains a generic List to store snapshotDiffs. The > generic List interface is bulky and to store and manage snapshotDiffs, we > need only a few specific methods. > This Jira proposes to define a new interface called DiffList interface which > will be used to store and manage snapshotDiffs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly
[ https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363771#comment-16363771 ] Istvan Fajth commented on HDFS-13040: - Hi [~xiaochen], I have checked into the proposal, a few questions/suggestions on the test code if I may: - in the test code's initKerberizedCluster method you set the log level for KerberosAuthenticator, and UGI to DEBUG, I am not sure if it is necessary, and if you decide to remove that you don't need to make the LOG public in those classes. - also in the initKerberizedCluster method, the basedir is set based on the TestDFSInotifyEventInputStream class, which might remain there due to my mistake, when I put the test into a separate class to upload just this test to this JIRA. - After the Thread.sleep in the test code's testWithKerberizedCluster method there is a log.error that I believe as well remained there from my code, and I believe it should not be error level - Class doc says: Class for Kerberized test cases for \{@link TestDFSInotifyEventInputStream} should this link to just DFSInotifyEventInputStream instead of the test class? > Kerberized inotify client fails despite kinit properly > -- > > Key: HDFS-13040 > URL: https://issues.apache.org/jira/browse/HDFS-13040 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, > HDFS-13040.03.patch, HDFS-13040.half.test.patch, > TestDFSInotifyEventInputStreamKerberized.java, TransactionReader.java > > > This issue is similar to HDFS-10799. > HDFS-10799 turned out to be a client side issue where client is responsible > for renewing kerberos ticket actively. > However we found in a slightly setup even if client has valid Kerberos > credentials, inotify still fails. > Suppose client uses principal h...@example.com, > namenode 1 uses server principal hdfs/nn1.example@example.com > namenode 2 uses server principal hdfs/nn2.example@example.com > *After Namenodes starts for longer than kerberos ticket lifetime*, the client > fails with the following error: > {noformat} > 18/01/19 11:23:02 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) > cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We > encountered an error reading > https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3, > > https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 8683, but we thought we could read up to transaction > 8684. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210) > {noformat} > Typically if NameNode has an expired Kerberos ticket, the error handling for > the typical edit log tailing would let NameNode to relogin with its own > Kerberos principal.
[jira] [Commented] (HDFS-13147) Support -c argument for DFS command head and tail
[ https://issues.apache.org/jira/browse/HDFS-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363754#comment-16363754 ] genericqa commented on HDFS-13147: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 8s{color} | {color:green} root: The patch generated 0 new + 175 unchanged - 2 fixed = 175 total (was 177) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 52s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 51s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}222m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.http.TestHttpServerWithSpengo | | | hadoop.cli.TestCLI | | | hadoop.log.TestLogLevel | | | hadoop.security.token.delegation.web.TestWebDelegationToken | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13147 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910512/HDFS-13147.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e107ea7dde9d 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-13113: -- Status: Patch Available (was: Open) > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HDFS-13113 > URL: https://issues.apache.org/jira/browse/HDFS-13113 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, nfs >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.05.patch, HADOOP-10571.07.patch > > > FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log > statements, including some in Datanode and elsewhere. > I'm provisionally +1 on that, but want to run it on the standalone tests > (Yetus has already done them), and give the HDFS developers warning of a > change which is going to touch their codebase. > If anyone doesn't want the logging improvements, now is your chance to say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-13113: -- Attachment: HADOOP-10571.07.patch > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HDFS-13113 > URL: https://issues.apache.org/jira/browse/HDFS-13113 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, nfs >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.05.patch, HADOOP-10571.07.patch > > > FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log > statements, including some in Datanode and elsewhere. > I'm provisionally +1 on that, but want to run it on the standalone tests > (Yetus has already done them), and give the HDFS developers warning of a > change which is going to touch their codebase. > If anyone doesn't want the logging improvements, now is your chance to say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions
[ https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-13113: -- Status: Open (was: Patch Available) > Use Log.*(Object, Throwable) overload to log exceptions > --- > > Key: HDFS-13113 > URL: https://issues.apache.org/jira/browse/HDFS-13113 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, nfs >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Andras Bokor >Priority: Major > Attachments: HADOOP-10571.05.patch > > > FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log > statements, including some in Datanode and elsewhere. > I'm provisionally +1 on that, but want to run it on the standalone tests > (Yetus has already done them), and give the HDFS developers warning of a > change which is going to touch their codebase. > If anyone doesn't want the logging improvements, now is your chance to say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13110) [SPS]: Reduce the number of APIs in NamenodeProtocol used by external satisfier
[ https://issues.apache.org/jira/browse/HDFS-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363735#comment-16363735 ] Rakesh R commented on HDFS-13110: - Addressed all the above Uma's comments. [~daryn], [~umamaheswararao], [~surendrasingh]. Appreciate reviews, thanks! > [SPS]: Reduce the number of APIs in NamenodeProtocol used by external > satisfier > --- > > Key: HDFS-13110 > URL: https://issues.apache.org/jira/browse/HDFS-13110 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-13110-HDFS-10285-00.patch, > HDFS-13110-HDFS-10285-01.patch, HDFS-13110-HDFS-10285-02.patch, > HDFS-13110-HDFS-10285-03.patch, HDFS-13110-HDFS-10285-04.patch > > > This task is to address the following [~daryn]'s comments. Please refer > HDFS-10285 to see more detailed discussion. > *Comment-10)* > {quote} > NamenodeProtocolTranslatorPB > Most of the api changes appear unnecessary. > IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption > that any and all IOEs means FNF which probably isn’t the intention during rpc > exceptions. > {quote} > *Comment-13)* > {quote} > StoragePolicySatisfier > It appears to make back-to-back calls to hasLowRedundancyBlocks and > getFileInfo for every file. Haven’t fully groked the code, but if low > redundancy is not the common case, then it shouldn’t be called unless/until > needed. It looks like files that are under replicated are re-queued again? > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly
[ https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363687#comment-16363687 ] genericqa commented on HDFS-13040: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 6s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 12m 6s{color} | {color:red} root generated 2 new + 1234 unchanged - 0 fixed = 1236 total (was 1234) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 1s{color} | {color:green} root: The patch generated 0 new + 150 unchanged - 2 fixed = 150 total (was 152) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 4s{color} | {color:green} hadoop-auth in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 0s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}135m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}234m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.log.TestLogLevel | | | hadoop.http.TestHttpServerWithSpengo | | | hadoop.security.token.delegation.web.TestWebDelegationToken | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-13040 | | JIRA Patch URL |
[jira] [Commented] (HDFS-13052) WebHDFS: Add support for snasphot diff
[ https://issues.apache.org/jira/browse/HDFS-13052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363648#comment-16363648 ] Lokesh Jain commented on HDFS-13052: [~xyao] the checkstyle issues appear in NamenodeWebHdfsMethods and I think for readability purpose we should not remove them. The patch follows the syntax structure defined in NamenodeWebHdfsMethods class. I have resumitted the v7 patch for triggering jenkins. > WebHDFS: Add support for snasphot diff > -- > > Key: HDFS-13052 > URL: https://issues.apache.org/jira/browse/HDFS-13052 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Attachments: HDFS-13052.001.patch, HDFS-13052.002.patch, > HDFS-13052.003.patch, HDFS-13052.004.patch, HDFS-13052.005.patch, > HDFS-13052.006.patch, HDFS-13052.007.patch > > > This Jira aims to implement snapshot diff operation for webHdfs filesystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org