[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085175#comment-17085175 ] Jonathan Hung commented on HDFS-15036: -- Pushed to branch-2.10. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998561#comment-16998561 ] Chen Liang commented on HDFS-15036: --- [~Jim_Brennan] I filed https://issues.apache.org/jira/browse/INFRA-19581, but haven't got update from Infra folks yet. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998526#comment-16998526 ] Jim Brennan commented on HDFS-15036: [~shv], [~jhung] was branch-2 actually deleted? I can still see it, and this commit is still there. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996000#comment-16996000 ] Chen Liang commented on HDFS-15036: --- Oops! Did not realize it's already deleted, guess I missed the messages... will work on deleting it again... > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995976#comment-16995976 ] Konstantin Shvachko commented on HDFS-15036: [~vagarychen] we should commit to branch-2.10. branch-2 was deleted as per discussion on hdfs-dev. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995038#comment-16995038 ] Chen Liang commented on HDFS-15036: --- Thanks [~shv]! I've committed to trunk and branch-2. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994950#comment-16994950 ] Hudson commented on HDFS-15036: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17758 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17758/]) HDFS-15036. Active NameNode should not silently fail the image transfer. (cliang: rev 65c4660bcd897e139fc175ca438cff75ec0c6be8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ImageServlet.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993972#comment-16993972 ] Konstantin Shvachko commented on HDFS-15036: +1 on v03 patch. TestFsck failure is tracked under HDFS-15038. And the checkstyle warning is bogus. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993325#comment-16993325 ] Hadoop QA commented on HDFS-15036: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 49s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 49s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 88 unchanged - 0 fixed = 89 total (was 88) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}107m 35s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15036 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988486/HDFS-15036.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 32b29ff6bfad 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c2e9783 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/28499/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28499/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993224#comment-16993224 ] Chen Liang commented on HDFS-15036: --- Thanks for the review [~shv], uploaded v03 patch > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993150#comment-16993150 ] Konstantin Shvachko commented on HDFS-15036: Looks good. Minor things # Typo in {{doCheckpoint()}}. Removed -is- in: {code}// by the other node. This could happen if{code} # Should use parameterized logging {code}LOG.info("Image upload rejected by the other NameNode: {}", uploadResult);{code} > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993133#comment-16993133 ] Hadoop QA commented on HDFS-15036: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 34s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 88 unchanged - 0 fixed = 89 total (was 88) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}169m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.client.TestQuorumJournalManager | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.TestFileAppend2 | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.qjournal.client.TestQJMWithFaults | | | hadoop.hdfs.TestWriteRead | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15036 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988468/HDFS-15036.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 21686e70fb56 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 875a3e9 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | findbugs |
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993033#comment-16993033 ] Chen Liang commented on HDFS-15036: --- Thanks for taking a look [~shv]! Post v002 patch. And the failed tests all passed in my local run. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992155#comment-16992155 ] Hadoop QA commented on HDFS-15036: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 17s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 20s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}162m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.TestFsck | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15036 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988378/HDFS-15036.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0e77d17e1e66 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dc66de7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/28488/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | unit |
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992138#comment-16992138 ] Konstantin Shvachko commented on HDFS-15036: Good investigation and findings [~vagarychen]. # Could you add a comment explaining that {{ImageServlet}} should not reject images other than checkpoints. # I am still concerned about the "silent" part. Should we add some logging, so that next time we could see what happened on both nodes. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992029#comment-16992029 ] Chen Liang commented on HDFS-15036: --- [~csun] np, sure, thanks for asking :) . Assigning to myself then. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992018#comment-16992018 ] Chao Sun commented on HDFS-15036: - [~vagarychen] sorry for grabbing this JIRA too soon :) Since you have done much study on this, do you want to take this JIRA instead? > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991998#comment-16991998 ] Chen Liang commented on HDFS-15036: --- Spent some time debugging this issue, I think I found the cause of the issue. In HDFS-12979, we introduced a logic that, if a image being uploaded is not too far ahead of the previous image, this image upload request is rejected. This is to prevent the scenario when there are multiple SbNs, all SbNs upload images to ANN too frequently. This is considered as correct behavior, so there is no logging indication of any error or anything here (the being "silent" part). Both ANN and SbN simply ignore and proceed. But now it appears that, a side effect of this change, is that during RU, the rollback image also has to go through this check, and it could also be rejected. If this happens, SbN proceeds assuming upload is done, while ANN proceeds with still not receiving the rollback image. The upload silently failed in this case. The check logic that rejects the upload is in {{ImageServlet}}. In my earlier test, I just commented out the whole block below and the issue seems gone. But I think the fix is probably just adding a new check to ensure this rejection only applies to regular image upload, like the newly added line in the line in the follow code snippet. But I haven't actually tested changing it this way.: {code} if (checkRecentImageEnable && NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && // <--- this should fix the issue timeDelta < checkpointPeriod && txid - lastCheckpointTxid < checkpointTxnCount) { // only when at least one of two conditions are met we accept // a new fsImage // 1. most recent image's txid is too far behind // 2. last checkpoint time was too old response.sendError(HttpServletResponse.SC_CONFLICT, "Most recent checkpoint is neither too far behind in " + "txid, nor too old. New txnid cnt is " + (txid - lastCheckpointTxid) + ", expecting at least " + checkpointTxnCount + " unless too long since last upload."); return null; } {code} > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990155#comment-16990155 ] Konstantin Shvachko commented on HDFS-15036: This can happen during checkpointing or preparing for a rolling upgrade. We observed it during rolling upgrade, when Standby was reporting: _"Rollback image has been created. Proceed to upgrade daemons."_ While Active still reported _" Rollback image has not been created."_ In the logs for ANN I see that it started receiving the image: {code:java} 2019-12-05 23:14:56,328 INFO org.apache.hadoop.hdfs.server.namenode.ImageServlet: ImageServlet allowing checkpointer: hdfs/active.namenode.com {code} But ANN did not print anything related to the image transfer afterwards. And the transferred image is missing in its storage directory. The ANN log message comes from {{isValidRequestor()}} called by {{ImageServlet.doPut()}}. SBN log indicates that the image was fully and successfully transferred to ANN {code:java} 2019-12-05 23:22:29,526 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Sending fileName: /hdfs-storage-dir/current/fsimage_rollback_00773999609, fileSize: 1889021016. Sent total: 1889021016 bytes. Size of last segment intended to send: -1 bytes. {code} The SBN log message comes from {{TransferFsImage.copyFileToStream()}}. Looking at the code in {{ImageServlet.doPut()}} I see that in one of the methods it calls {{Util.receiveFile()}} if an Exception is thrown inside the while-loop performing reading from the input (socket) stream and writing to the output (image file) stream, then it will go through a series of finalized sections without catching the exception and logging it or reporting the error to the sender. We should: # Catch and log any exceptions occurring there # Notify SBN about the error, so that it could retry the transfer > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org