[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-06-23 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870797#comment-16870797
 ] 

Lisheng Sun commented on HDFS-14313:


[~jojochuang] Thanks for your comment .I add synchronized to 
FsDatasetImpl#deepCopyReplica.
{code:java}
 @Override
  public synchronized Set deepCopyReplica(String bpid)
  throws IOException {
Set replicas =
new HashSet<>(volumeMap.replicas(bpid) == null ? Collections.EMPTY_SET
: volumeMap.replicas(bpid));
return replicas;
  }
{code}
I don't use FsDatasetImpl#datasetLock , Because FsDatasetImpl#addBlockPool with 
datasetLock call FsDatasetImpl#deepCopyReplica in another Thread.

Please continue to help review code. Please correct me if I am wrong. Thanks.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-06-23 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.003.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12914:
---
Fix Version/s: 3.0.4

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12914:
---
Fix Version/s: 3.1.3

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-06-23 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Comment: was deleted

(was: [~jojochuang] Thanks for your comments.  I add lock in 
FsDatasetImpl#deepCopyReplica.
{code:java}
@Override
public Set deepCopyReplica(String bpid)
throws IOException {
  try (AutoCloseableLock lock = datasetLock.acquire()) {
Set replicas =
new HashSet<>(volumeMap.replicas(bpid) == null ? Collections.EMPTY_SET
: volumeMap.replicas(bpid));
return replicas;
  }
}
{code}

Please continue to help review this issue. Thank you.
)

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870775#comment-16870775
 ] 

Wei-Chiu Chuang commented on HDFS-12914:


+1 for the branch-3.1 002 patch. Other than the TestDiskBalancer test, failed 
tests don't reproduce for me.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-06-23 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870773#comment-16870773
 ] 

Lisheng Sun commented on HDFS-14313:


[~jojochuang] Thanks for your comments.  I add lock in 
FsDatasetImpl#deepCopyReplica.
{code:java}
@Override
public Set deepCopyReplica(String bpid)
throws IOException {
  try (AutoCloseableLock lock = datasetLock.acquire()) {
Set replicas =
new HashSet<>(volumeMap.replicas(bpid) == null ? Collections.EMPTY_SET
: volumeMap.replicas(bpid));
return replicas;
  }
}
{code}

Please continue to help review this issue. Thank you.


> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-06-23 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.002.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870739#comment-16870739
 ] 

Hadoop QA commented on HDFS-14590:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
31m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14590 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972676/HDFS-14590.002.patch |
| Optional Tests |  dupname  asflicense  mvnsite  xml  |
| uname | Linux ac0b78de9ef9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 318 (vs. ulimit of 1) |
| modules | C: hadoop-project U: hadoop-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27047/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14593) RBF: RouterAdmin should be able to remove expired routers from Routers Information

2019-06-23 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870729#comment-16870729
 ] 

Takanobu Asanuma commented on HDFS-14593:
-

Thanks for your comment, [~crh].

IIUC, currently users can't remove old routers, which are replaced or whose 
hostnames are changed, from RouterState(Router Information page). I propose 
that RouterAdmin can remove a router from RouterState when its status is 
EXPIRED.

> RBF: RouterAdmin should be able to remove expired routers from Routers 
> Information
> --
>
> Key: HDFS-14593
> URL: https://issues.apache.org/jira/browse/HDFS-14593
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> Currently, any router seems to exist in the Router Information eternally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-23 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870719#comment-16870719
 ] 

Takanobu Asanuma commented on HDFS-14590:
-

Thanks for your review, [~ayushtkn]. I agree with you. Updated the patch.

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-06-23 Thread Takanobu Asanuma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14590:

Attachment: HDFS-14590.002.patch

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870714#comment-16870714
 ] 

Wei-Chiu Chuang edited comment on HDFS-12914 at 6/24/19 12:36 AM:
--

Looks like HDFS-12487 breaks the TestDiskBalancer test. Filed HDFS-14599 for 
that.


was (Author: jojochuang):
Looks like HDFS-12487 breaks the getBlockToCopy test. Filed HDFS-14599 for that.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14599) HDFS-12487 breaks test TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty

2019-06-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-14599:
--

 Summary: HDFS-12487 breaks test 
TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty
 Key: HDFS-14599
 URL: https://issues.apache.org/jira/browse/HDFS-14599
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: diskbalancer
Affects Versions: 3.3.0, 3.2.1, 3.1.3
Reporter: Wei-Chiu Chuang


It looks like HDFS-12487 changes the error message expected by 
{{TestDiskBalancer#testDiskBalancerWithFedClusterWithOneNameServiceEmpty}}.

The test expects error "There are no blocks in the blockPool" but after 
HDFS-12487, it returns error string "NextBlock call returned null.No valid 
block to copy."

Probably the simplest approach to fix it is to update the expected error string.

Thoughts? [~bharatviswa] you crafted the test in HDFS-13715. Should we update 
the expected error string, or revert HDFS-12487?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870714#comment-16870714
 ] 

Wei-Chiu Chuang edited comment on HDFS-12914 at 6/24/19 12:36 AM:
--

Looks like HDFS-12487 breaks the getBlockToCopy test. Filed HDFS-14599 for that.


was (Author: jojochuang):
Looks like HDFS-12487 breaks the getBlockToCopy test.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870714#comment-16870714
 ] 

Wei-Chiu Chuang commented on HDFS-12914:


Looks like HDFS-12487 breaks the getBlockToCopy test.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870712#comment-16870712
 ] 

Wei-Chiu Chuang commented on HDFS-12914:


Filed  HDFS-14598 for the findbugs warning.

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14598) Findbugs warning caused by HDFS-12487

2019-06-23 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-14598:
--

 Summary: Findbugs warning caused by HDFS-12487
 Key: HDFS-14598
 URL: https://issues.apache.org/jira/browse/HDFS-14598
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: diskbalancer
Reporter: Wei-Chiu Chuang


https://builds.apache.org/job/PreCommit-HDFS-Build/27038/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html

{noformat}
Redundant nullcheck of block, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DiskBalancer$DiskBalancerMover.getBlockToCopy(FsVolumeSpi$BlockIterator,
 DiskBalancerWorkItem)
Bug type RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE (click for details) 
In class org.apache.hadoop.hdfs.server.datanode.DiskBalancer$DiskBalancerMover
In method 
org.apache.hadoop.hdfs.server.datanode.DiskBalancer$DiskBalancerMover.getBlockToCopy(FsVolumeSpi$BlockIterator,
 DiskBalancerWorkItem)
Value loaded from block
Return value of 
org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$BlockIterator.nextBlock()
 of type org.apache.hadoop.hdfs.protocol.ExtendedBlock
Redundant null check at DiskBalancer.java:[line 912]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14586) Trash missing delete the folder which near timeout checkpoint

2019-06-23 Thread maobaolong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870692#comment-16870692
 ] 

maobaolong commented on HDFS-14586:
---

[~hexiaoqiao] i have another viewpoint, we need to keep checkpoint  generated 
timestamp to measure something, so he do the check when delete, i think the two 
ways are both correct, but this way can keep the timestamp. 

> Trash missing delete the folder which near timeout checkpoint
> -
>
> Key: HDFS-14586
> URL: https://issues.apache.org/jira/browse/HDFS-14586
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hu yongfa
>Assignee: hu yongfa
>Priority: Major
> Attachments: HDFS-14586.001.patch
>
>
> when trash timeout checkpoint coming, trash will delete the old folder first, 
> then create a new checkpoint folder.
> as the delete action may spend a long time, such as 2 minutes, so the new 
> checkpoint folder created late.
> at the next trash timeout checkpoint, trash will skip delete the new 
> checkpoint folder, because the new checkpoint folder is 
> less than a checkpoint interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870634#comment-16870634
 ] 

Lars Francke commented on HDFS-11242:
-

I'm afraid I still don't fully understand.


Either way [~reidchan] I think this is a good change and desperately needed. 
Are you willing to rebase and bring up-to-date?
If you like I can ask on the mailing list about some review support before you 
spent time on this.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870607#comment-16870607
 ] 

Hadoop QA commented on HDFS-14429:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
32s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 20 new + 159 unchanged - 0 fixed = 179 total (was 159) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 16s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 |
| JIRA Issue | HDFS-14429 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972620/HDFS-14429.branch-2.01.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2b1a32e77838 3.13.0-153-generic #203-Ubuntu SMP 

[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870593#comment-16870593
 ] 

Hadoop QA commented on HDFS-14429:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 18 new + 135 unchanged - 0 fixed = 153 total (was 135) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 57s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.diskbalancer.TestDiskBalancer |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14429 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972616/HDFS-14429.02.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2d54115f24c2 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27045/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27045/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-14429:
--
Attachment: HDFS-14429.branch-2.01.patch

> Block remain in COMMITTED but not COMPLETE cause by Decommission
> 
>
> Key: HDFS-14429
> URL: https://issues.apache.org/jira/browse/HDFS-14429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, 
> HDFS-14429.branch-2.01.patch
>
>
> In the following scenario, the Block will remain in the COMMITTED but not 
> COMPLETE state and cannot be closed properly:
>  # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3).
>  # bk1 has been completely written to three data nodes, and the data node 
> succeeds FinalizeBlock, joins IBR and waits to report to NameNode.
>  # The client commits bk1 after receiving the ACK.
>  # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 
> enter Decommissioning.
>  # The DN reports the IBR, but the block cannot be completed normally.
>  
> Then it will lead to the following related exceptions:
> {panel:title=Exception}
> 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem 
> (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* 
> blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= 
> minimum = 1) in file xxx
> 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - 
> IPC Server handler 499 on 8020, call Call#122552 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not 
> replicated yet: xxx
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> {panel}
> And will cause the scenario described in HDFS-12747
> The root cause is that addStoredBlock does not consider the case where the 
> replications are in Decommission.
> This problem needs to be fixed like HDFS-11499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Yicong Cai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870579#comment-16870579
 ] 

Yicong Cai commented on HDFS-14429:
---

Provided branch-2 [^HDFS-14429.branch-2.01.patch] and trunck 
[^HDFS-14429.02.patch] [~jojochuang]

> Block remain in COMMITTED but not COMPLETE cause by Decommission
> 
>
> Key: HDFS-14429
> URL: https://issues.apache.org/jira/browse/HDFS-14429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, 
> HDFS-14429.branch-2.01.patch
>
>
> In the following scenario, the Block will remain in the COMMITTED but not 
> COMPLETE state and cannot be closed properly:
>  # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3).
>  # bk1 has been completely written to three data nodes, and the data node 
> succeeds FinalizeBlock, joins IBR and waits to report to NameNode.
>  # The client commits bk1 after receiving the ACK.
>  # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 
> enter Decommissioning.
>  # The DN reports the IBR, but the block cannot be completed normally.
>  
> Then it will lead to the following related exceptions:
> {panel:title=Exception}
> 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem 
> (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* 
> blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= 
> minimum = 1) in file xxx
> 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - 
> IPC Server handler 499 on 8020, call Call#122552 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not 
> replicated yet: xxx
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> {panel}
> And will cause the scenario described in HDFS-12747
> The root cause is that addStoredBlock does not consider the case where the 
> replications are in Decommission.
> This problem needs to be fixed like HDFS-11499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-14429:
--
Target Version/s: 2.10.0, 3.3.0, 2.9.3  (was: 3.3.0, 2.9.3)

> Block remain in COMMITTED but not COMPLETE cause by Decommission
> 
>
> Key: HDFS-14429
> URL: https://issues.apache.org/jira/browse/HDFS-14429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, 
> HDFS-14429.branch-2.01.patch
>
>
> In the following scenario, the Block will remain in the COMMITTED but not 
> COMPLETE state and cannot be closed properly:
>  # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3).
>  # bk1 has been completely written to three data nodes, and the data node 
> succeeds FinalizeBlock, joins IBR and waits to report to NameNode.
>  # The client commits bk1 after receiving the ACK.
>  # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 
> enter Decommissioning.
>  # The DN reports the IBR, but the block cannot be completed normally.
>  
> Then it will lead to the following related exceptions:
> {panel:title=Exception}
> 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem 
> (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* 
> blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= 
> minimum = 1) in file xxx
> 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - 
> IPC Server handler 499 on 8020, call Call#122552 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not 
> replicated yet: xxx
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> {panel}
> And will cause the scenario described in HDFS-12747
> The root cause is that addStoredBlock does not consider the case where the 
> replications are in Decommission.
> This problem needs to be fixed like HDFS-11499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870574#comment-16870574
 ] 

He Xiaoqiao commented on HDFS-11242:


Thanks [~larsfrancke] for your doubt and the detailed comments.
For the situation you said above, I think the proper way is:
a. clean CachedDNSToSwitchMapping#cache complete or just clean one host. (which 
do not support currently).
b. fix the bug via Consul (sorry I am not familiar with that).
c. stop DataNode and better to wait for proper time (630s by default which 
namenode consider datanode dead) then start DataNode.
Please correct me if something wrong.

I want to state that change network topology / rack aware script is very risky. 
We should change it cautiously. especially for large cluster. FYI.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870566#comment-16870566
 ] 

Lars Francke commented on HDFS-11242:
-

I think I'm getting a clearer picture now but (and there's always a chance I'm 
mistaken) I believe what you're suggesting does not work.

Focusing on your second point: This is what's not currently possible!

Because the script is only ever called _once_ and the result is then cached. 
Look at CachedDNSToSwitchMapping.

So if you have a flexible rack aware script (which we do) but have a bug in it 
(e.g. reading racks from Consul but there's an error in the mapping) you have 
no way of fixing that error without restarting the NameNode because the bad 
result is cached.

Regarding your first point: Thanks for the pointer. After a restart a fixed 
rack awareness script would be called anyway so this wouldn't make any 
difference.

In short: I believe the patch & solution that [~reidchan] proposed is a very 
valid use-case.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-06-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870558#comment-16870558
 ] 

He Xiaoqiao commented on HDFS-12914:


I found failed unit tests also failed before patch. Any changes for branch-3.1 
recently?

> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, 
> HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, 
> HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870557#comment-16870557
 ] 

He Xiaoqiao commented on HDFS-11242:


Thanks [~larsfrancke] for your quick response. IIUC, namenode will check if 
replications of one block has enough racks based on PlacementPolicy when 
namenode restart or failover. {{BlockManager#postponedMisreplicatedBlocks}} 
could be one data structure which is related. Please double check and correct 
me if something wrong.
As mentioned above, I prefer to design flexible rack aware script (refer to 
https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/RackAwareness.html),
 rather than have to specify proper rack when add new nodes.
Thanks again.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-14429:
--
Attachment: HDFS-14429.02.patch

> Block remain in COMMITTED but not COMPLETE cause by Decommission
> 
>
> Key: HDFS-14429
> URL: https://issues.apache.org/jira/browse/HDFS-14429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch
>
>
> In the following scenario, the Block will remain in the COMMITTED but not 
> COMPLETE state and cannot be closed properly:
>  # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3).
>  # bk1 has been completely written to three data nodes, and the data node 
> succeeds FinalizeBlock, joins IBR and waits to report to NameNode.
>  # The client commits bk1 after receiving the ACK.
>  # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 
> enter Decommissioning.
>  # The DN reports the IBR, but the block cannot be completed normally.
>  
> Then it will lead to the following related exceptions:
> {panel:title=Exception}
> 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem 
> (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* 
> blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= 
> minimum = 1) in file xxx
> 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - 
> IPC Server handler 499 on 8020, call Call#122552 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not 
> replicated yet: xxx
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> {panel}
> And will cause the scenario described in HDFS-12747
> The root cause is that addStoredBlock does not consider the case where the 
> replications are in Decommission.
> This problem needs to be fixed like HDFS-11499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-14429:
--
Attachment: (was: HDFS-14429.02.patch)

> Block remain in COMMITTED but not COMPLETE cause by Decommission
> 
>
> Key: HDFS-14429
> URL: https://issues.apache.org/jira/browse/HDFS-14429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14429.01.patch
>
>
> In the following scenario, the Block will remain in the COMMITTED but not 
> COMPLETE state and cannot be closed properly:
>  # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3).
>  # bk1 has been completely written to three data nodes, and the data node 
> succeeds FinalizeBlock, joins IBR and waits to report to NameNode.
>  # The client commits bk1 after receiving the ACK.
>  # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 
> enter Decommissioning.
>  # The DN reports the IBR, but the block cannot be completed normally.
>  
> Then it will lead to the following related exceptions:
> {panel:title=Exception}
> 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem 
> (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* 
> blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= 
> minimum = 1) in file xxx
> 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - 
> IPC Server handler 499 on 8020, call Call#122552 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not 
> replicated yet: xxx
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> {panel}
> And will cause the scenario described in HDFS-12747
> The root cause is that addStoredBlock does not consider the case where the 
> replications are in Decommission.
> This problem needs to be fixed like HDFS-11499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission

2019-06-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-14429:
--
Attachment: HDFS-14429.02.patch

> Block remain in COMMITTED but not COMPLETE cause by Decommission
> 
>
> Key: HDFS-14429
> URL: https://issues.apache.org/jira/browse/HDFS-14429
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch
>
>
> In the following scenario, the Block will remain in the COMMITTED but not 
> COMPLETE state and cannot be closed properly:
>  # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3).
>  # bk1 has been completely written to three data nodes, and the data node 
> succeeds FinalizeBlock, joins IBR and waits to report to NameNode.
>  # The client commits bk1 after receiving the ACK.
>  # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 
> enter Decommissioning.
>  # The DN reports the IBR, but the block cannot be completed normally.
>  
> Then it will lead to the following related exceptions:
> {panel:title=Exception}
> 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem 
> (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* 
> blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= 
> minimum = 1) in file xxx
> 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - 
> IPC Server handler 499 on 8020, call Call#122552 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not 
> replicated yet: xxx
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
> {panel}
> And will cause the scenario described in HDFS-12747
> The root cause is that addStoredBlock does not consider the case where the 
> replications are in Decommission.
> This problem needs to be fixed like HDFS-11499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870546#comment-16870546
 ] 

Lars Francke commented on HDFS-11242:
-

Specifically: What [~reidchan] can currently already be done but requires a 
restart, so the same "risk" already exists.

Does HDFS start moving data when the topolgy changes? I always was under the 
assumption that it only happens on a balance operation.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870544#comment-16870544
 ] 

Lars Francke commented on HDFS-11242:
-

[~hexiaoqiao] I don't understand. Can you be more specific?

The usecase I see is simple: You add new nodes but forget to specify the proper 
rack, the only way to fix it now is to restart the NameNodes which is not a 
good experience.

I haven't looked at the patch in detail but all I need is a way to call 
reloadCachedMappings on the DNSToSwitchMapping.

We can change the ScriptBasedMapping to not extend CachedDNSToSwitchMapping or 
to change CachedDNSToSwitchMapping to periodically invalidate all entries or 
something like that.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870542#comment-16870542
 ] 

He Xiaoqiao commented on HDFS-11242:


It looks like a good idea in some ways. But I do not think we should do that, 
since it could bring huge data transfer risk and destroy stability of cluster 
if we provide tools to refresh network topology dynamically. IMO, we should 
design more flexible rack aware script and avoid to change network topology.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14577) RBF: FederationUtil#newInstance should allow constructor without context

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870543#comment-16870543
 ] 

Hadoop QA commented on HDFS-14577:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-13891 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
23s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} HDFS-13891 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} HDFS-13891 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 21s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterWithSecureStartup |
|   | hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14577 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972008/HDFS-14577-HDFS-13891.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3e4b94d9553a 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-13891 / 02597b6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27044/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27044/testReport/ |
| Max. process+thread count | 1585 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 

[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870532#comment-16870532
 ] 

Reid Chan commented on HDFS-11242:
--

Yes, only if any committer or member can review or comment, most importantly 
willing to have it in HDFS, otherwise it will just get no notice and set aside 
as it was.

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870486#comment-16870486
 ] 

Hadoop QA commented on HDFS-11242:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-11242 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-11242 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12843182/HDFS-11242.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27043/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin

2019-06-23 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870483#comment-16870483
 ] 

Lars Francke commented on HDFS-11242:
-

This would be super useful to have.

[~reidchan] are you by any chance looking to revive this patch?

> Add refresh cluster network topology operation to dfs admin
> ---
>
> Key: HDFS-11242
> URL: https://issues.apache.org/jira/browse/HDFS-11242
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Reid Chan
>Priority: Minor
> Attachments: HDFS-11242.002.patch, HDFS-11242.patch
>
>
> The network topology and dns to switch mapping are initialized at the start 
> of the namenode.
> If admin wants to change the topology because of new datanodes added, he has 
> to stop and restart namenode(s), otherwise those new added datanodes are 
> squeezed under /default-rack.
> It is a low frequency operation, but it should be operated appropriately, so 
> dfs admin should take the responsibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org