[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-05-26 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Fix Version/s: (was: 3.0.1)
   3.0.3

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.3
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-07 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   2.10.0
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks [~billyean] and [~jojochuang] for the review. I committed to trunk, 
3.0.1, branch-2.

 

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Attachment: HDFS-13115.002.patch

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch, HDFS-13115.002.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Attachment: HDFS-13115.001.patch

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HDFS-13115.001.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Attachment: (was: HADOOP-13115.001.patch)

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13115:
---
Target Version/s: 3.1.0, 2.10.0

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HADOOP-13115.001.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Status: Patch Available  (was: Open)

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HADOOP-13115.001.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Attachment: HADOOP-13115.001.patch

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Attachments: HADOOP-13115.001.patch
>
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13115) In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes have been deleted

2018-02-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-13115:
-
Summary: In getNumUnderConstructionBlocks(), ignore the inodeIds for which 
the inodes have been deleted   (was: Handle deleted inode in 
getNumUnderConstructionBlocks())

> In getNumUnderConstructionBlocks(), ignore the inodeIds for which the inodes 
> have been deleted 
> ---
>
> Key: HDFS-13115
> URL: https://issues.apache.org/jira/browse/HDFS-13115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> In LeaseManager, 
> {code}
>  private synchronized INode[] getINodesWithLease() {
> List inodes = new ArrayList<>(leasesById.size());
> INode currentINode;
> for (long inodeId : leasesById.keySet()) {
>   currentINode = fsnamesystem.getFSDirectory().getInode(inodeId);
>   // A file with an active lease could get deleted, or its
>   // parent directories could get recursively deleted.
>   if (currentINode != null &&
>   currentINode.isFile() &&
>   !fsnamesystem.isFileDeleted(currentINode.asFile())) {
> inodes.add(currentINode);
>   }
> }
> return inodes.toArray(new INode[0]);
>   }
> {code}
> we can see that given an {{inodeId}},  
> {{fsnamesystem.getFSDirectory().getInode(inodeId)}} could return NULL . The 
> reason is explained in the comment.
> HDFS-12985 RCAed a case and solved that case, we saw that it fixes some 
> cases, but we are still seeing NullPointerException from FSnamesystem
> {code}
>   public long getCompleteBlocksTotal() {
> // Calculate number of blocks under construction
> long numUCBlocks = 0;
> readLock();
> try {
>   numUCBlocks = leaseManager.getNumUnderConstructionBlocks(); <=== here
>   return getBlocksTotal() - numUCBlocks;
> } finally {
>   readUnlock();
> }
>   }
> {code}
> The exception happens when the inode is removed for the given inodeid, see 
> LeaseManager code below:
> {code}
>   synchronized long getNumUnderConstructionBlocks() {
> assert this.fsnamesystem.hasReadLock() : "The FSNamesystem read lock 
> wasn't"
>   + "acquired before counting under construction blocks";
> long numUCBlocks = 0;
> for (Long id : getINodeIdWithLeases()) {
>   final INodeFile cons = 
> fsnamesystem.getFSDirectory().getInode(id).asFile(); <=== here
>   Preconditions.checkState(cons.isUnderConstruction());
>   BlockInfo[] blocks = cons.getBlocks();
>   if(blocks == null)
> continue;
>   for(BlockInfo b : blocks) {
> if(!b.isComplete())
>   numUCBlocks++;
>   }
> }
> LOG.info("Number of blocks under construction: " + numUCBlocks);
> return numUCBlocks;
>   }
> {code}
> Create this jira to add a check whether the inode is removed, as a safeguard, 
> to avoid the NullPointerException.
> Looks that after the inodeid is returned by {{getINodeIdWithLeases()}}, it 
> got deleted from FSDirectory map.
> Ideally we should find out who deleted it, like in HDFS-12985. 
> But it seems reasonable to me to have a safeguard here, like other code that 
> calls to {{fsnamesystem.getFSDirectory().getInode(id)}} in the code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org