[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2022-05-18 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538930#comment-17538930
 ] 

ZanderXu commented on HDFS-15562:
-

[~shv] [~aihuaxu]  are you still following up this issue? In our production, 
one NameService contains multiple ObserverNameNode, and we stop one Observer 
and plan to offline, but it caused standby abnormally doing checkpoint.

bq. We may add a logic for the Checkpointer to not re-create an image if it was 
created recently
`lastCheckpointTime` already exist, but not update it when some exception 
happened.

bq. we see transfers fail once in a while, so just ignoring image transfer 
failures isn't right.
Standby can uploads the latest fsImage to all namenodes as much as possible. 
For abnormal namenode, if Standby retries multiple times, it still fails, 
Standby just ignore it will be ok.

[~shv] [~ferhui] [~hexiaoqiao] do you have some good ideas about it? And I will 
be happy to work on it.

BTW, do we need a mechanism to actively trigger checkpoint? 

 

 

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15562.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-11-16 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233178#comment-17233178
 ] 

Aihua Xu commented on HDFS-15562:
-

Thanks [~shv] for your comment. When I get time, I will focus on not recreating 
the image if there is a recent one. 

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15562.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-11-09 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228786#comment-17228786
 ] 

Konstantin Shvachko commented on HDFS-15562:


Hey guys, I think generally the checkpointer should persist until the 
checkpoint completes and the image is transferred. With large image we see 
transfers fail once in a while, so just ignoring image transfer failures isn't 
right.
I understand that with multiple ObserverNodes some of them can be down.
We already have logic for ActiveNN and ObserverNodes to reject an image if they 
already have one recent enough. So frequent checkpoints should not overwhelm 
the active or the Observers.
We may add a logic for the Checkpointer to not re-create an image if it was 
created recently. But this does not seem to be a big concern.

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15562.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-11-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226420#comment-17226420
 ] 

Wei-Chiu Chuang commented on HDFS-15562:


[~cliang] [~shv]  do you think you can help Aihua with the review?

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15562.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-11-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225803#comment-17225803
 ] 

Hadoop QA commented on HDFS-15562:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
10s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 
 3s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 43s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
11s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
9s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2430/1/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt{color}
 | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 
35 unchanged - 0 fixed = 37 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 13s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green}{color} 

[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-10-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223389#comment-17223389
 ] 

Hadoop QA commented on HDFS-15562:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
11s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 1 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 41m 
40s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
42s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 33s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
13s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
10s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt|https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/277/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt]
 | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 
34 unchanged - 0 fixed = 36 total (was 34) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 51s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
13s{color} |  | {color:green} the patch passed {color} |
|| || 

[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-10-29 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223287#comment-17223287
 ] 

Aihua Xu commented on HDFS-15562:
-

[~weichiu], [~csun] Can you help review the change? Thanks.

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
> Attachments: HDFS-15562.patch
>
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-10-03 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206838#comment-17206838
 ] 

Aihua Xu commented on HDFS-15562:
-

[~aswqazxsd] I will take a look. If you can provide more details like the 
version, stack trace, etc. that will be helpful.

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Assignee: Aihua Xu
>Priority: Major
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15562) StandbyCheckpointer will do checkpoint repeatedly while connecting observer/active namenode failed

2020-09-09 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192719#comment-17192719
 ] 

jianghua zhu commented on HDFS-15562:
-

[~aswqazxsd] , can you tell which version you are using.

 

 

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: SunHao
>Priority: Major
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org