[jira] [Updated] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-18 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-13693:
---
Attachment: HDFS-13693-005.patch

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-18 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-13693:
---
Attachment: HDFS-13693-004.patch

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-18 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-13693:
---
Attachment: (was: HDFS-13693-004.patch)

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-18 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888483#comment-16888483
 ] 

Lisheng Sun commented on HDFS-13693:


[~jojochuang] [~hexiaoqiao] I have updated this patch with adding some 
annotations and could you help review it? Thank you.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-21 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889667#comment-16889667
 ] 

Lisheng Sun commented on HDFS-14313:


Thank [~linyiqun] for your review carefully. I have a few questions to discuss 
with you.

1 I don't use the component specific impl class in the common module. to update 
in GetSpaceUsed of common model is just to make subclasses inheritable. And 
that to update in CommonConfigurationKeys of common model is to print threshold 
time,which should be moved to DFSConfigKeys and is more appropriate.

2. Now there are switches that are used in control. follow as 
GetSpaceUsed#Builder#CLASSNAME_KEY
{code:java}
// static final String CLASSNAME_KEY = "fs.getspaceused.classname";
{code}
if add enableFSCachingGetSpace as you say, there are two switches that give the 
user more confusion.

3.
{quote}Even though deepCopyReplica is only used by another thread, I still 
prefer to let it be an atomic operation incase this will be used in other 
places in the future. Can you add datasetock here?
{quote}
FsDatasetImpl#addBlockPool with datasetLock ->FsVolumeList#addBlockPool
{code:java}
@Override
public void addBlockPool(String bpid, Configuration conf)
throws IOException {
  LOG.info("Adding block pool " + bpid);
  try (AutoCloseableLock lock = datasetLock.acquire()) {
volumes.addBlockPool(bpid, conf);
volumeMap.initBlockPool(bpid);
  }
  volumes.getAllVolumesMap(bpid, volumeMap, ramDiskReplicaTracker);
}
{code}
FsVolumeList#addBlockPool ->FsVolumeImpl#addBlockPool -> new BlockPoolSlice 
->FsDatasetImpl#deepCopyReplica. If deepCopyReplica use datasetock, it appears 
deadlock. So use Collections.unmodifiableSet to make replica info is not 
allowed to be modified outside
{code:java}
void addBlockPool(final String bpid, final Configuration conf) throws 
IOException {
  long totalStartTime = Time.monotonicNow();
  final Map unhealthyDataDirs =
  new ConcurrentHashMap();
  List blockPoolAddingThreads = new ArrayList();
  for (final FsVolumeImpl v : volumes) {
Thread t = new Thread() {
  public void run() {
try (FsVolumeReference ref = v.obtainReference()) {
  FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
  " on volume " + v + "...");
  long startTime = Time.monotonicNow();
  v.addBlockPool(bpid, conf);
  long timeTaken = Time.monotonicNow() - startTime;
  FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
  " on " + v + ": " + timeTaken + "ms");
} catch (ClosedChannelException e) {
  // ignore.
} catch (IOException ioe) {
  FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
  ". Will throw later.", ioe);
  unhealthyDataDirs.put(v, ioe);
}
  }
};
blockPoolAddingThreads.add(t);
t.start();
  }
{code}
 4. According to your suggestion, I will modify UT for using  real minicluster 
and adding a comparison test by respectively using FSCachingGetUsed and default 
Du way.

Please correct me if I was wrong. Thank [~linyiqun] again.

 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-21 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889669#comment-16889669
 ] 

Lisheng Sun commented on HDFS-13693:


[~hexiaoqiao] has +1 for patch in Jira  and [~jojochuang] has +1 for PR.

[~jojochuang],[~xkrogen],[~elgoiri]  [~ayushtkn] Would you mind take another 
reviews? Thank you.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-21 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889696#comment-16889696
 ] 

Lisheng Sun commented on HDFS-13693:


Thank [~ayushtkn] for your review.  I have delete related PR, And from now on I 
have only  keep this patch for v5 for this issue.  Thank [~ayushtkn] again.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-21 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889701#comment-16889701
 ] 

Lisheng Sun commented on HDFS-13693:


Thank [~ayushtkn] for your suggestion. I must pay attention to it from next 
time.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-22 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890228#comment-16890228
 ] 

Lisheng Sun commented on HDFS-13693:


hi [~ayushtkn]  If you think this patch no problem, could you help merge it to 
trunk ? Thank you.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-22 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890237#comment-16890237
 ] 

Lisheng Sun commented on HDFS-14313:


hi [~linyiqun], could you mind continuing to reiview this patch? Thank you.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-23 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890943#comment-16890943
 ] 

Lisheng Sun commented on HDFS-13693:


 
{code:java}
[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
protoc version is 'libprotoc 3.0.0', expected version is '2.5.0' -> [Help 1]
{code}
I think this failure is unrelated to this patch. From the output log,it should 
be proto version problem. 

 

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-07-23 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890943#comment-16890943
 ] 

Lisheng Sun edited comment on HDFS-13693 at 7/23/19 12:19 PM:
--

 
{code:java}
[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
protoc version is 'libprotoc 3.0.0', expected version is '2.5.0' -> [Help 1]
{code}
I think this failure is unrelated to this patch. From the output log,it should 
be protoc version problem. 

 


was (Author: leosun08):
 
{code:java}
[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
protoc version is 'libprotoc 3.0.0', expected version is '2.5.0' -> [Help 1]
{code}
I think this failure is unrelated to this patch. From the output log,it should 
be proto version problem. 

 

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13594) the lock of ShortCircuitCache is hold while close the ShortCircuitReplica

2019-07-24 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-13594:
--

Assignee: Lisheng Sun

> the lock of ShortCircuitCache is hold while close the ShortCircuitReplica  
> ---
>
> Key: HDFS-13594
> URL: https://issues.apache.org/jira/browse/HDFS-13594
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.2
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: no_hdfs.svg
>
>
> When we are profiling SC read, we find that ShortCircuitCache's lock is a hot 
> spot. After look into the code, we find that when close BlockReaderLocal, it 
> tries to  trimEvictionMaps, and several ShortCircuitReplicas are closed while 
> the lock being hold.  This slows down the close of the BlockReaderLocal, and 
> the worse is that it blocks the other allocating of the new 
> ShortCircuitReplicas. 
> An idea to avoid this is to close the replica in an async way. I will do a 
> prototype and get the performance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13564) PreAllocator for DfsClientShm

2019-07-24 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-13564:
--

Assignee: Lisheng Sun

> PreAllocator for DfsClientShm
> -
>
> Key: HDFS-13564
> URL: https://issues.apache.org/jira/browse/HDFS-13564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.2
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Minor
> Fix For: 3.0.2
>
>
> When we do a stress test against Short-Circuit Local Reads, and found a 
> bottleneck that allocating new DfsClientShm blocks a lot of slot allocatings 
> on it.
> Currently, there are 128 slots per shm which means at most, 128 reads could 
> be blocked by the shm allocation. Especially when stressed, the domain socket 
> communication to datanode gets slow, and datanode could also have GC, which 
> could cause some hundreds ms to allocate 1 shm, in turn, the reads. This is 
> bad for some latency sensitive service, like Hbase.
> I'm working on the prototype and will upload the code and test result later. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-25 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.008.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-25 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892921#comment-16892921
 ] 

Lisheng Sun edited comment on HDFS-14313 at 7/25/19 4:24 PM:
-

Thank [~linyiqun] for your review. I have updated the patch as your comments. 
Could you help review it? Thank you.


was (Author: leosun08):
Thank [~linyiqun] for your review. I have update the patch as your comments. 
Could you help review it? Thank you.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-25 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892921#comment-16892921
 ] 

Lisheng Sun commented on HDFS-14313:


Thank [~linyiqun] for your review. I have update the patch as your comments. 
Could you help review it? Thank you.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-27 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894465#comment-16894465
 ] 

Lisheng Sun commented on HDFS-14313:


hi [~linyiqun] [~jojochuang] Could you have time to review this patch? Thank 
you a lot.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-7868) Use proper blocksize to choose target for blocks

2019-07-27 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-7868:
-

Assignee: Lisheng Sun  (was: zhouyingchao)

> Use proper blocksize to choose target for blocks
> 
>
> Key: HDFS-7868
> URL: https://issues.apache.org/jira/browse/HDFS-7868
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7868-001.patch
>
>
> In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is 
> used to determine if there is enough room for a new block on a data node. 
> However, in two conditions the blockSize might not be proper for the purpose: 
> (a) the passed in block size is just the size of the last block of a file, 
> which might be very small (for e.g., called from 
> BlockManager.ReplicationWork.chooseTargets). (b) A file which might be 
> created with a smaller blocksize.
> In these conditions, the calculated scheduledSize might be smaller than the 
> actual value, which finally might lead to following failure of writing or 
> replication.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-28 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894893#comment-16894893
 ] 

Lisheng Sun edited comment on HDFS-14313 at 7/29/19 3:11 AM:
-

Thanx [~linyiqun] for your review.

*FSCachingGetSpaceUsed*
{quote}Line 53: Add the final keyword for the FsVolumeImpl variable.
 Line 54: Add the final keyword too.
{quote}
The value of volume and bpid is assignment by set*, so don't add the final 
keyword for these two variable.
{quote}Line 75: We don't pass the config to use the threshold time now, still 
we need to override this method? If don't need, the change made in
 GetSpaceUsed can also be reverted.
{quote}
Add the new variable volume and bpid of FSCachingGetSpaceUsed for HDFS module,so
 ReplicaCachingGetSpaceUsed‘s Constructor parameter must be 
FSCachingGetSpaceUsed#Builder and don't remove FSCachingGetSpaceUsed#build().
 * TestReplicaCachingGetSpaceUsed*
{quote}Line 69: As I have mentioned before, can we have an additional 
comparison for the DU impl class? The most of lines can be reused for these two 
getused impl class. Just passing different key value with restart the mini 
cluster and comparing the used space.
{quote}

 get space used by DU impl include

├── current
 │   ├── BP-1876464514-10.239.56.179-1564369203299
 │   │   ├── current
 │   │   │   ├── VERSION
 │   │   │   ├── finalized
 │   │   │   │   └── subdir0
 │   │   │   │   └── subdir0
 │   │   │   │   ├── blk_1073741825
 │   │   │   │   └── blk_1073741825_1001.meta
 │   │   │   └── rbw
 │   │   ├── scanner.cursor
 │   │   └── tmp
 │   └── VERSION
 └── in_use.lock

get space used by ReplicaCachingGetSpaceUsed impl include

├── blk_1073741825
 └── blk_1073741825_1001.meta

 Get space used by DU impl include all directories size, other files such as 
VERSION, in_use.lock and so on.

Get space used by DU impl must be greater than by ReplicaCachingGetSpaceUsed 
impl. get space used by ReplicaCachingGetSpaceUsed impl is more accurate. so is 
it necessary to add comparison for the DU impl class? 

Please correct me if I was wrong. Thank [~linyiqun] again.


was (Author: leosun08):
Thanx [~linyiqun] for your review.

*FSCachingGetSpaceUsed*
{quote}Line 53: Add the final keyword for the FsVolumeImpl variable.
 Line 54: Add the final keyword too.
{quote}
The value of volume and bpid is assignment by set*, so don't add the final 
keyword for these two variable.
{quote}Line 75: We don't pass the config to use the threshold time now, still 
we need to override this method? If don't need, the change made in
 GetSpaceUsed can also be reverted.
{quote}
Add the new variable volume and bpid of FSCachingGetSpaceUsed for HDFS module,so
 ReplicaCachingGetSpaceUsed‘s Constructor parameter must be 
FSCachingGetSpaceUsed#Builder and don't remove FSCachingGetSpaceUsed#build().
 * TestReplicaCachingGetSpaceUsed*
{quote}Line 69: As I have mentioned before, can we have an additional 
comparison for the DU impl class? The most of lines can be reused for these two 
getused impl class. Just passing different key value with restart the mini 
cluster and comparing the used space.
{quote}

 get space used by DU impl include

├── current
│   ├── BP-1876464514-10.239.56.179-1564369203299
│   │   ├── current
│   │   │   ├── VERSION
│   │   │   ├── finalized
│   │   │   │   └── subdir0
│   │   │   │   └── subdir0
│   │   │   │   ├── blk_1073741825
│   │   │   │   └── blk_1073741825_1001.meta
│   │   │   └── rbw
│   │   ├── scanner.cursor
│   │   └── tmp
│   └── VERSION
└── in_use.lock

get space used by ReplicaCachingGetSpaceUsed impl include

├── blk_1073741825
└── blk_1073741825_1001.meta

 get space used by DU impl include all directories size, other files such as 
VERSION, in_use.lock and so on

Get space used by DU impl must be greater than by ReplicaCachingGetSpaceUsed 
impl. get space used by ReplicaCachingGetSpaceUsed impl is more accurate. so is 
it necessary to add comparison for the DU impl class? 

Please correct me if I was wrong. Thank [~linyiqun] again.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insuf

[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-28 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894893#comment-16894893
 ] 

Lisheng Sun commented on HDFS-14313:


Thanx [~linyiqun] for your review.

*FSCachingGetSpaceUsed*
{quote}Line 53: Add the final keyword for the FsVolumeImpl variable.
 Line 54: Add the final keyword too.
{quote}
The value of volume and bpid is assignment by set*, so don't add the final 
keyword for these two variable.
{quote}Line 75: We don't pass the config to use the threshold time now, still 
we need to override this method? If don't need, the change made in
 GetSpaceUsed can also be reverted.
{quote}
Add the new variable volume and bpid of FSCachingGetSpaceUsed for HDFS module,so
 ReplicaCachingGetSpaceUsed‘s Constructor parameter must be 
FSCachingGetSpaceUsed#Builder and don't remove FSCachingGetSpaceUsed#build().
 * TestReplicaCachingGetSpaceUsed*
{quote}Line 69: As I have mentioned before, can we have an additional 
comparison for the DU impl class? The most of lines can be reused for these two 
getused impl class. Just passing different key value with restart the mini 
cluster and comparing the used space.
{quote}

 get space used by DU impl include

├── current
│   ├── BP-1876464514-10.239.56.179-1564369203299
│   │   ├── current
│   │   │   ├── VERSION
│   │   │   ├── finalized
│   │   │   │   └── subdir0
│   │   │   │   └── subdir0
│   │   │   │   ├── blk_1073741825
│   │   │   │   └── blk_1073741825_1001.meta
│   │   │   └── rbw
│   │   ├── scanner.cursor
│   │   └── tmp
│   └── VERSION
└── in_use.lock

get space used by ReplicaCachingGetSpaceUsed impl include

├── blk_1073741825
└── blk_1073741825_1001.meta

 get space used by DU impl include all directories size, other files such as 
VERSION, in_use.lock and so on

Get space used by DU impl must be greater than by ReplicaCachingGetSpaceUsed 
impl. get space used by ReplicaCachingGetSpaceUsed impl is more accurate. so is 
it necessary to add comparison for the DU impl class? 

Please correct me if I was wrong. Thank [~linyiqun] again.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-28 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.009.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-28 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894898#comment-16894898
 ] 

Lisheng Sun commented on HDFS-14313:


Updated patch according the review comments and uploaded the v9 patch.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14644) That replication of block failed leads to decommission is blocked when the number of replicas of block is greater than the number of datanode

2019-07-30 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-14644:
--

Assignee: Lisheng Sun

> That replication of block failed leads to decommission is blocked when the 
> number of replicas of block is greater than the number of datanode
> -
>
> Key: HDFS-14644
> URL: https://issues.apache.org/jira/browse/HDFS-14644
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.1, 2.9.2, 3.0.3, 2.8.5, 2.7.7
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>
> 2019-07-10,15:37:18,028 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 5 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All 
> required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-07-10,15:37:18,028 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 5 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14644) That replication of block failed leads to decommission is blocked when the number of replicas of block is greater than the number of datanode

2019-07-30 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896099#comment-16896099
 ] 

Lisheng Sun commented on HDFS-14644:


Thanx [~sodonnell] [~jojochuang] for your suggestions.

I intend to implement it by the commnad as HDFS-12946. 

If you think no problem, I will do this work. Thank you.

> That replication of block failed leads to decommission is blocked when the 
> number of replicas of block is greater than the number of datanode
> -
>
> Key: HDFS-14644
> URL: https://issues.apache.org/jira/browse/HDFS-14644
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.1, 2.9.2, 3.0.3, 2.8.5, 2.7.7
>Reporter: Lisheng Sun
>Priority: Major
>
> 2019-07-10,15:37:18,028 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 5 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) All 
> required storage types are unavailable: unavailableStorages=[DISK, ARCHIVE], 
> storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-07-10,15:37:18,028 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 5 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy\{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-07-30 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896696#comment-16896696
 ] 

Lisheng Sun commented on HDFS-14313:


hi [~linyiqun]  Could you have time to continue review ? Thank you.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14290) Unexpected message type: PooledUnsafeDirectByteBuf when get datanode info by DatanodeWebHdfsMethods

2019-07-30 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14290:
---
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

> Unexpected message type: PooledUnsafeDirectByteBuf when get datanode info by 
> DatanodeWebHdfsMethods
> ---
>
> Key: HDFS-14290
> URL: https://issues.apache.org/jira/browse/HDFS-14290
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, webhdfs
>Affects Versions: 2.7.0, 2.7.1
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14290.000.patch, webhdfs show.png
>
>
> The issue is there is no HttpRequestDecoder in InboundHandler of netty,  
> appear unexpected message type when read message.
>   
> !webhdfs show.png!   
> DEBUG org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Proxy 
> failed. Cause: 
>  com.xiaomi.infra.thirdparty.io.netty.handler.codec.EncoderException: 
> java.lang.IllegalStateException: unexpected message type: 
> PooledUnsafeDirectByteBuf
>  at 
> com.xiaomi.infra.thirdparty.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:106)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:348)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:304)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:137)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1051)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:300)
>  at 
> org.apache.hadoop.hdfs.server.datanode.web.SimpleHttpProxyHandler$Forwarder.channelRead(SimpleHttpProxyHandler.java:80)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:146)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.util.concurrent.SingleTh

[jira] [Commented] (HDFS-14290) Unexpected message type: PooledUnsafeDirectByteBuf when get datanode info by DatanodeWebHdfsMethods

2019-07-30 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896754#comment-16896754
 ] 

Lisheng Sun commented on HDFS-14290:


Yeah. Sorry. It's not a problem. I have closed it. 

> Unexpected message type: PooledUnsafeDirectByteBuf when get datanode info by 
> DatanodeWebHdfsMethods
> ---
>
> Key: HDFS-14290
> URL: https://issues.apache.org/jira/browse/HDFS-14290
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, webhdfs
>Affects Versions: 2.7.0, 2.7.1
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14290.000.patch, webhdfs show.png
>
>
> The issue is there is no HttpRequestDecoder in InboundHandler of netty,  
> appear unexpected message type when read message.
>   
> !webhdfs show.png!   
> DEBUG org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Proxy 
> failed. Cause: 
>  com.xiaomi.infra.thirdparty.io.netty.handler.codec.EncoderException: 
> java.lang.IllegalStateException: unexpected message type: 
> PooledUnsafeDirectByteBuf
>  at 
> com.xiaomi.infra.thirdparty.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:106)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:348)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:304)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:137)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:802)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1051)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:300)
>  at 
> org.apache.hadoop.hdfs.server.datanode.web.SimpleHttpProxyHandler$Forwarder.channelRead(SimpleHttpProxyHandler.java:80)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:146)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at 
> com.xiaomi.infra.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at 
> com.xiaomi.infra.thirdparty.io

[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-01 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.010.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-01 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897988#comment-16897988
 ] 

Lisheng Sun commented on HDFS-14313:


Thanx for [~linyiqun] for deep review.  I have updated this patch as your 
comments and uploaded the v10 patch.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-03 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899565#comment-16899565
 ] 

Lisheng Sun commented on HDFS-14313:


ping [~linyiqun] Could you mind taking a review for this patch? Thank you.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13700) The process of loading image can be done in a pipeline model

2019-08-04 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-13700:
---
Description: 
The process of loading a file system image involves reading inodes section, 
deserializing inodes, initializing inodes, adding inodes to the global map, 
reading directories section, adding inodes to their parents' map, cache name 
etc. These steps can be done in a pipeline model to reduce the total duration. 

Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 
300million blocks, the fsimage is around 22GB), the image loading time be 
reduced from 1210 seconds to 739 seconds.

  was:The process of loading a file system image involves reading inodes 
section, deserializing inodes, initializing inodes, adding inodes to the global 
map, reading directories section, adding inodes to their parents' map, cache 
name etc. These steps can be done in a pipeline model to reduce the total 
duration. 


> The process of loading image can be done in a pipeline model
> 
>
> Key: HDFS-13700
> URL: https://issues.apache.org/jira/browse/HDFS-13700
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13700-001.patch
>
>
> The process of loading a file system image involves reading inodes section, 
> deserializing inodes, initializing inodes, adding inodes to the global map, 
> reading directories section, adding inodes to their parents' map, cache name 
> etc. These steps can be done in a pipeline model to reduce the total 
> duration. 
> Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 
> 300million blocks, the fsimage is around 22GB), the image loading time be 
> reduced from 1210 seconds to 739 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13700) The process of loading image can be done in a pipeline model

2019-08-04 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-13700:
---
Attachment: HDFS-13700.002.patch

> The process of loading image can be done in a pipeline model
> 
>
> Key: HDFS-13700
> URL: https://issues.apache.org/jira/browse/HDFS-13700
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13700-001.patch, HDFS-13700.002.patch
>
>
> The process of loading a file system image involves reading inodes section, 
> deserializing inodes, initializing inodes, adding inodes to the global map, 
> reading directories section, adding inodes to their parents' map, cache name 
> etc. These steps can be done in a pipeline model to reduce the total 
> duration. 
> Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 
> 300million blocks, the fsimage is around 22GB), the image loading time be 
> reduced from 1210 seconds to 739 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13700) The process of loading image can be done in a pipeline model

2019-08-04 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-13700:
---
Description: 
The process of loading a file system image involves reading inodes section, 
deserializing inodes, initializing inodes, adding inodes to the global map, 
reading directories section, adding inodes to their parents' map, cache name 
etc. These steps can be done in a pipeline model to reduce the total duration. 

Test the patch against a fsimage of a 70PB  cluster (200million files and 
300million blocks, the fsimage is around 22GB), the image loading time be 
reduced from 1210 seconds to 739 seconds.

  was:
The process of loading a file system image involves reading inodes section, 
deserializing inodes, initializing inodes, adding inodes to the global map, 
reading directories section, adding inodes to their parents' map, cache name 
etc. These steps can be done in a pipeline model to reduce the total duration. 

Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 
300million blocks, the fsimage is around 22GB), the image loading time be 
reduced from 1210 seconds to 739 seconds.


> The process of loading image can be done in a pipeline model
> 
>
> Key: HDFS-13700
> URL: https://issues.apache.org/jira/browse/HDFS-13700
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13700-001.patch, HDFS-13700.002.patch
>
>
> The process of loading a file system image involves reading inodes section, 
> deserializing inodes, initializing inodes, adding inodes to the global map, 
> reading directories section, adding inodes to their parents' map, cache name 
> etc. These steps can be done in a pipeline model to reduce the total 
> duration. 
> Test the patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks, the fsimage is around 22GB), the image loading time be 
> reduced from 1210 seconds to 739 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)
Lisheng Sun created HDFS-14701:
--

 Summary: Change Log Level to warn in SlotReleaser
 Key: HDFS-14701
 URL: https://issues.apache.org/jira/browse/HDFS-14701
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lisheng Sun


{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment.

java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14701:
---
Description: 
{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
 *exception stack:*
{code:java}
// code placeholder
{code}
2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment.

java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1

  was:
{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment.

java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1


> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
>
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStr

[jira] [Updated] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14701:
---
Description: 
{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
 *exception stack:*
{code:java}
// 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
{code}
 

  was:
{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
 *exception stack:*
{code:java}
// code placeholder
{code}
2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment.

java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1


> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
>
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
> 

[jira] [Updated] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14701:
---
Description: 
 if the corresponding DataNode has been stopped or restarted and DFSClient 
close shared memory segment,releaseShortCircuitFds API throw expection and log 
a ERROR Message. I think it should not be a ERROR log,and that log a warn log 
is more reasonable.
{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
 *exception stack:*
{code:java}
2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment.

java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
{code}
 

  was:
{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
 *exception stack:*
{code:java}
// 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
{code}
 


> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw ex

[jira] [Updated] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14701:
---
  Assignee: Lisheng Sun
Attachment: HDFS-14701.001.patch
Status: Patch Available  (was: Open)

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-05 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.011.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900119#comment-16900119
 ] 

Lisheng Sun commented on HDFS-14701:


hi [~ayushtkn] Could you mind help review this patch? Thank you.

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900119#comment-16900119
 ] 

Lisheng Sun edited comment on HDFS-14701 at 8/5/19 2:32 PM:


hi [~ayushtkn]  [~xkrogen] [~brahmareddy].Could you mind help review this 
patch? Thank you.


was (Author: leosun08):
hi [~ayushtkn] Could you mind help review this patch? Thank you.

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-05 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900515#comment-16900515
 ] 

Lisheng Sun commented on HDFS-14313:


Thanx  [~linyiqun] for your deep review. I updated this patch as your 
comments. 

checkstyle issue:
{code:java}
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java:64:
public Builder setBpid(String bpid) {:35: 'bpid' hides a field. 
[HiddenField]
{code}
I think it is not a problem. I haved uploaded the v11 patch. Could you help 
review it?  Thank you. 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-05 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900515#comment-16900515
 ] 

Lisheng Sun edited comment on HDFS-14313 at 8/6/19 1:37 AM:


Thanx  [~linyiqun] for your deep review. I updated this patch as your 
comments. 

checkstyle issue:
{code:java}
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java:64:
public Builder setBpid(String bpid) {:35: 'bpid' hides a field. 
[HiddenField]
{code}
I think it is not a problem.

I haved uploaded the v11 patch. Could you help review it?  Thank you. 


was (Author: leosun08):
Thanx  [~linyiqun] for your deep review. I updated this patch as your 
comments. 

checkstyle issue:
{code:java}
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java:64:
public Builder setBpid(String bpid) {:35: 'bpid' hides a field. 
[HiddenField]
{code}
I think it is not a problem. I haved uploaded the v11 patch. Could you help 
review it?  Thank you. 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-05 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900563#comment-16900563
 ] 

Lisheng Sun commented on HDFS-14313:


Thank [~jojochuang] for your suggestion.
  
{quote}One thing that's missing out in the v11 is the update to 
hdfs-default.xml which was missing since v8.
{quote}

 According to [~linyiqun] suggetion, I define a hard-coded threadold time value 
like 1000ms in ReplicaCachingGetSpaceUsed. So remove config in hdfs-default.xml.
{code:java}
private static final long DEEP_COPY_REPLICA_THRESHOLD_MS = 50;
private static final long REPLICA_CACHING_GET_SPACE_USED_THRESHOLD_MS = 1000;
{code}

{quote}
Additionally there should be an additional config key 
"fs.getspaceused.classname" in core-default.xml, and state that possible 
options are

org.apache.hadoop.fs.DU (default)
org.apache.hadoop.fs.WindowsGetSpaceUsed
org.apache.hadoop.fs.DFCachingGetSpaceUsed
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
{quote}

That ReplicaCachingGetSpaceUsed of hdfs module is added in core-default.xml of 
common module should not be very good. Please correct me if I was wrong. Thank 
you again.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.012.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900724#comment-16900724
 ] 

Lisheng Sun commented on HDFS-14313:


Thanx [~linyiqun] [~jojochuang] for your good suggestions.  I added the key 
fs.getspaceused.classname and the usage of these four impl classes in 
core-default.xml. 

I have uloaded the v12 patch. Thank you again.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Status: Patch Available  (was: Open)

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 3.1.0, 3.0.0, 2.9.0, 2.8.0, 2.7.0, 2.6.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Status: Open  (was: Patch Available)

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 3.1.0, 3.0.0, 2.9.0, 2.8.0, 2.7.0, 2.6.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900753#comment-16900753
 ] 

Lisheng Sun commented on HDFS-14313:


{quote}
HDFS-14313 does not apply to trunk. Rebase required? Wrong Branch? See 
https://wiki.apache.org/hadoop/HowToContribute for help.
{quote}
That I only update core-default.xml leads to this problem.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.013.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900863#comment-16900863
 ] 

Lisheng Sun commented on HDFS-14313:


Thank [~linyiqun] for your reminder. I rebased the code, updated description 
about config key "fs.getspaceused.classname" in core-default.xml and uploaded 
the v013 patch. 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.014.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901146#comment-16901146
 ] 

Lisheng Sun commented on HDFS-14313:


Fix UT in hadoop.conf.TestCommonConfigurationFields and other UTs fails is 
unrelated to this patch.  

Upload the v014 patch. Please help review it. Thank [~linyiqun].

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14708) TestLargeBlockReport#testBlockReportSucceedsWithLargerLengthLimit fails in trunk

2019-08-06 Thread Lisheng Sun (JIRA)
Lisheng Sun created HDFS-14708:
--

 Summary: 
TestLargeBlockReport#testBlockReportSucceedsWithLargerLengthLimit fails in trunk
 Key: HDFS-14708
 URL: https://issues.apache.org/jira/browse/HDFS-14708
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lisheng Sun


{code:java}
2019-08-07 09:56:26,082 [IPC Server handler 7 on default port 49613] INFO 
ipc.Server (Server.java:logException(2982)) - IPC Server handler 7 on default 
port 49613, call Call#7 Retry#0 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
127.0.0.1:49618
java.io.IOException: java.lang.IllegalStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the 
size limit.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.runBlockOp(BlockManager.java:5011)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1581)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:181)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31664)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
Caused by: java.lang.IllegalStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the 
size limit.
at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:424)
at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:396)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiffSorted(BlockManager.java:2952)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2787)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2655)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.lambda$blockReport$0(NameNodeRpcServer.java:1582)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5089)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5068)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large. May be malicious. Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
at 
com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:420)
... 8 more
{code}
Ref :: 
[https://builds.apache.org/job/PreCommit-HDFS-Build/27416/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14708) TestLargeBlockReport#testBlockReportSucceedsWithLargerLengthLimit fails in trunk

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14708:
---
Description: 
{code:java}
[ERROR] 
testBlockReportSucceedsWithLargerLengthLimit(org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport)
  Time elapsed: 47.956 s  <<< ERROR!
org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
java.lang.IllegalStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the 
size limit.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.runBlockOp(BlockManager.java:5011)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1581)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:181)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31664)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
Caused by: java.lang.IllegalStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the 
size limit.
at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:424)
at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:396)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiffSorted(BlockManager.java:2952)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2787)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2655)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.lambda$blockReport$0(NameNodeRpcServer.java:1582)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5089)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5068)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at 
com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at 
com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
at 
com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:420)
... 8 more

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553)
at org.apache.hadoop.ipc.Client.call(Client.java:1499)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy25.blockReport(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:218)
at 
org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport.testBlockReportSucceedsWithLargerLengthLimit(TestLargeBlockReport.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExp

[jira] [Assigned] (HDFS-14708) TestLargeBlockReport#testBlockReportSucceedsWithLargerLengthLimit fails in trunk

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-14708:
--

Assignee: Lisheng Sun

> TestLargeBlockReport#testBlockReportSucceedsWithLargerLengthLimit fails in 
> trunk
> 
>
> Key: HDFS-14708
> URL: https://issues.apache.org/jira/browse/HDFS-14708
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
>
> {code:java}
> [ERROR] 
> testBlockReportSucceedsWithLargerLengthLimit(org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport)
>   Time elapsed: 47.956 s  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> java.lang.IllegalStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.runBlockOp(BlockManager.java:5011)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1581)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:181)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31664)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
> Caused by: java.lang.IllegalStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
>   at 
> org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:424)
>   at 
> org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:396)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiffSorted(BlockManager.java:2952)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2787)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:2655)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.lambda$blockReport$0(NameNodeRpcServer.java:1582)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5089)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5068)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message was too large.  May be malicious.  Use 
> CodedInputStream.setSizeLimit() to increase the size limit.
>   at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>   at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>   at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
>   at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
>   at 
> org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:420)
>   ... 8 more
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1396)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy25.blockReport(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:218)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport.testBlockReportSucceedsWithLargerLengthLimit(TestLargeBlockReport.java:97)
>   at sun.reflect.NativeMethodAcces

[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901736#comment-16901736
 ] 

Lisheng Sun commented on HDFS-14313:


Thanx [~linyiqun] for all your work for this patch. I will attach the patch for 
branch-3.x and branch-2.x branches later. 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.branch-3.v1.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901778#comment-16901778
 ] 

Lisheng Sun commented on HDFS-14313:


I attached patch for branch-3.  you hope I attach patch for branch-3.0, 
branch-3.1, branch3.2 and branch2.9. Instead of branch-3 and branch-2? I think 
I was misunderstanding.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.branch-3.0.v1.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-06 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901783#comment-16901783
 ] 

Lisheng Sun commented on HDFS-14313:


Thank [~linyiqun] for your suggestion. I rename patch for branch-3.x and upload 
the branch-3.0.v1 patch.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313-branch-2.v1.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, HDFS-14313.000.patch, 
> HDFS-14313.001.patch, HDFS-14313.002.patch, HDFS-14313.003.patch, 
> HDFS-14313.004.patch, HDFS-14313.005.patch, HDFS-14313.006.patch, 
> HDFS-14313.007.patch, HDFS-14313.008.patch, HDFS-14313.009.patch, 
> HDFS-14313.010.patch, HDFS-14313.011.patch, HDFS-14313.012.patch, 
> HDFS-14313.013.patch, HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313.branch-3.0.v2.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14701:
---
Attachment: HDFS-14701.002.patch

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch, HDFS-14701.002.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-07 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901929#comment-16901929
 ] 

Lisheng Sun commented on HDFS-14701:


Thanx [~jojochuang] for you suggestion. I updated the patch as your comment and 
uploaded the v002 patch. Could you help review it? Thank you.

> Change Log Level to warn in SlotReleaser
> 
>
> Key: HDFS-14701
> URL: https://issues.apache.org/jira/browse/HDFS-14701
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14701.001.patch, HDFS-14701.002.patch
>
>
>  if the corresponding DataNode has been stopped or restarted and DFSClient 
> close shared memory segment,releaseShortCircuitFds API throw expection and 
> log a ERROR Message. I think it should not be a ERROR log,and that log a warn 
> log is more reasonable.
> {code:java}
> // @Override
> public void run() {
>   LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
>   final DfsClientShm shm = (DfsClientShm)slot.getShm();
>   final DomainSocket shmSock = shm.getPeer().getDomainSocket();
>   final String path = shmSock.getPath();
>   boolean success = false;
>   try (DomainSocket sock = DomainSocket.connect(path);
>DataOutputStream out = new DataOutputStream(
>new BufferedOutputStream(sock.getOutputStream( {
> new Sender(out).releaseShortCircuitFds(slot.getSlotId());
> DataInputStream in = new DataInputStream(sock.getInputStream());
> ReleaseShortCircuitAccessResponseProto resp =
> ReleaseShortCircuitAccessResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> if (resp.getStatus() != Status.SUCCESS) {
>   String error = resp.hasError() ? resp.getError() : "(unknown)";
>   throw new IOException(resp.getStatus().toString() + ": " + error);
> }
> LOG.trace("{}: released {}", this, slot);
> success = true;
>   } catch (IOException e) {
> LOG.error(ShortCircuitCache.this + ": failed to release " +
> "short-circuit shared memory slot " + slot + " by sending " +
> "ReleaseShortCircuitAccessRequestProto to " + path +
> ".  Closing shared memory segment.", e);
>   } finally {
> if (success) {
>   shmManager.freeSlot(slot);
> } else {
>   shm.getEndpointShmManager().shutdown(shm);
> }
>   }
> }
> {code}
>  *exception stack:*
> {code:java}
> 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
> ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
> slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
> sending ReleaseShortCircuitAccessRequestProto to 
> /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared 
> memory segment.
> java.io.IOException: ERROR_INVALID: there is no shared memory segment 
> registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14313:
---
Attachment: HDFS-14313-branch-2.v2.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-08-07 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902154#comment-16902154
 ] 

Lisheng Sun commented on HDFS-14313:


the branch-2.v2 for patch fixes unchecked issue in javac. It can be ignored.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13571) Dead DataNode Detector

2019-08-07 Thread Lisheng Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902631#comment-16902631
 ] 

Lisheng Sun commented on HDFS-13571:


Sorry [~linyiqun], I have been working on this JIRA. Recently, the company has 
a lot of things, so there is some delay. I will update this jira as soon as 
possible.

> Dead DataNode Detector
> --
>
> Key: HDFS-13571
> URL: https://issues.apache.org/jira/browse/HDFS-13571
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13571-2.6.diff, node status machine.png
>
>
> Currently, the information of the dead datanode in DFSInputStream in stored 
> locally. So, it could not be shared among the inputstreams of the same 
> DFSClient. In our production env, every days, some datanodes dies with 
> different causes. At this time, after the first inputstream blocked and 
> detect this, it could share this information to others in the same DFSClient, 
> thus, the ohter inputstreams are still blocked by the dead node for some 
> time, which could cause bad service latency.
> To eliminate this impact from dead datanode, we designed a dead datanode 
> detector, which detect the dead ones in advance, and share this information 
> among all the inputstreams in the same client. This improvement has being 
> online for some months and works fine.  So, we decide to port to the 3.0 (the 
> version used in our production env is 2.4 and 2.6).
> I will do the porting work and upload the code later.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14330) Consider StorageID to choose volume

2019-03-03 Thread Lisheng Sun (JIRA)
Lisheng Sun created HDFS-14330:
--

 Summary: Consider StorageID to choose volume 
 Key: HDFS-14330
 URL: https://issues.apache.org/jira/browse/HDFS-14330
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0-alpha4
Reporter: Lisheng Sun


RoundRobinVolumeChoosingPolicy#chooseVolume does not consider  parameter 
storageId. 

The {{BlockPlacementPolicy}} considers specific storages and return the 
infomation including storageId to client.
{code:java}
 @Override
  public V chooseVolume(final List volumes, long blockSize, String storageId)
  throws IOException {

if (volumes.size() < 1) {
  throw new DiskOutOfSpaceException("No more available volumes");
}

// As all the items in volumes are with the same storage type,
// so only need to get the storage type index of the first item in volumes
StorageType storageType = volumes.get(0).getStorageType();
int index = storageType != null ?
storageType.ordinal() : StorageType.DEFAULT.ordinal();

synchronized (syncLocks[index]) {
  return chooseVolume(index, volumes, blockSize);
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2019-12-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-15046:
--

Assignee: Lisheng Sun

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2019-12-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15046:
---
Attachment: HDFS-15046.branch-2.9.001.patch
Status: Patch Available  (was: Open)

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.9.001.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2019-12-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15046:
---
Attachment: HDFS-15046.branch-2.9.002.patch

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.9.001.patch, 
> HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2019-12-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15046:
---
Attachment: HDFS-15046.branch-2.001.patch

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2020-01-14 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15046:
---
Status: Open  (was: Patch Available)

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2020-01-14 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15046:
---
Status: Patch Available  (was: Open)

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2020-01-15 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15046:
---
Attachment: HDFS-15046.branch-2.9.002(2).patch

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002(2).patch, 
> HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15046) Backport HDFS-7060 to branch-2.10

2020-01-15 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016568#comment-17016568
 ] 

Lisheng Sun commented on HDFS-15046:


[~weichiu] Could you help review this patch? Thank you.

> Backport HDFS-7060 to branch-2.10
> -
>
> Key: HDFS-15046
> URL: https://issues.apache.org/jira/browse/HDFS-15046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15046.branch-2.001.patch, 
> HDFS-15046.branch-2.9.001.patch, HDFS-15046.branch-2.9.002(2).patch, 
> HDFS-15046.branch-2.9.002.patch
>
>
> Not sure why it didn't get backported in 2.x before, but looks like a good 
> improvement overall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14651) DeadNodeDetector checks dead node periodically

2020-02-01 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028269#comment-17028269
 ] 

Lisheng Sun commented on HDFS-14651:


Thanks [~ahussein] for your questions.
{quote}1.what is the usage of deadNodeDetectInterval ? As far as I understand, 
every call to checkDeadNodes() will change the state to IDLE forcing the 
DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, why do we need 
deadNodeDetectInterval if the actual time gap between every check is 
IDLE_SLEEP_MS?
{quote}
checkDeadNodes in checkDeadNodes() is not really necessary,since call idle() 
after checkDeadNodes().
{quote} stopDeadNodeDetectorThread.stopDeadNodeDetectorThread() is supposed to 
stop the deadNodeDetector thread; but it looks like the implementation of the 
runnable never terminates. DeadNodeDetector surpresses all interrupts and never 
checks for a termination flag. Therefore, the caller will just hang for 3 
seconds waiting to join.
{quote}
{code:java}
/**
   * Close dead node detector thread.
   */
  public void stopDeadNodeDetectorThread() {
if (deadNodeDetectorThr != null) {
  deadNodeDetectorThr.interrupt();
  try {
deadNodeDetectorThr.join(3000);
  } catch (InterruptedException e) {
LOG.warn("Encountered exception while waiting to join on dead " +
"node detector thread.", e);
  }
}
  }
{code}
i remove 3s timeout in deadNodeDetectorThr.join() for waiting for stoping the 
deadNodeDetector thread.

i  will create issue to fix these two problems. Thanks you.

> DeadNodeDetector checks dead node periodically
> --
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()

2020-02-11 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-15161:
--

 Summary: When evictableMmapped or evictable size is zero, do not 
throw NoSuchElementException in ShortCircuitCache#close() 
 Key: HDFS-15161
 URL: https://issues.apache.org/jira/browse/HDFS-15161
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lisheng Sun
Assignee: Lisheng Sun


detail see 
 # HDFS-14541

 # HDFS-14541

 # HDFS-14541

 

 
{code:java}
/**
 * Close the cache and free all associated resources.
 */
@Override
public void close() {
  try {
lock.lock();
if (closed) return;
closed = true;
LOG.info(this + ": closing");
maxNonMmappedEvictableLifespanMs = 0;
maxEvictableMmapedSize = 0;
// Close and join cacheCleaner thread.
IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
// Purge all replicas.
while (true) {
  Object eldestKey;
  try {
eldestKey = evictable.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictable.get(eldestKey));
}
while (true) {
  Object eldestKey;
  try {
eldestKey = evictableMmapped.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
}
  } finally {
lock.unlock();
  }
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()

2020-02-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15161:
---
Description: 
detail see HDFS-14541
{code:java}
/**
 * Close the cache and free all associated resources.
 */
@Override
public void close() {
  try {
lock.lock();
if (closed) return;
closed = true;
LOG.info(this + ": closing");
maxNonMmappedEvictableLifespanMs = 0;
maxEvictableMmapedSize = 0;
// Close and join cacheCleaner thread.
IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
// Purge all replicas.
while (true) {
  Object eldestKey;
  try {
eldestKey = evictable.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictable.get(eldestKey));
}
while (true) {
  Object eldestKey;
  try {
eldestKey = evictableMmapped.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
}
  } finally {
lock.unlock();
  }
{code}
 

  was:
detail see 
 # HDFS-14541

 # HDFS-14541

 # HDFS-14541

 

 
{code:java}
/**
 * Close the cache and free all associated resources.
 */
@Override
public void close() {
  try {
lock.lock();
if (closed) return;
closed = true;
LOG.info(this + ": closing");
maxNonMmappedEvictableLifespanMs = 0;
maxEvictableMmapedSize = 0;
// Close and join cacheCleaner thread.
IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
// Purge all replicas.
while (true) {
  Object eldestKey;
  try {
eldestKey = evictable.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictable.get(eldestKey));
}
while (true) {
  Object eldestKey;
  try {
eldestKey = evictableMmapped.firstKey();
  } catch (NoSuchElementException e) {
break;
  }
  purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
}
  } finally {
lock.unlock();
  }
{code}
 


> When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException in ShortCircuitCache#close() 
> --
>
> Key: HDFS-15161
> URL: https://issues.apache.org/jira/browse/HDFS-15161
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>
> detail see HDFS-14541
> {code:java}
> /**
>  * Close the cache and free all associated resources.
>  */
> @Override
> public void close() {
>   try {
> lock.lock();
> if (closed) return;
> closed = true;
> LOG.info(this + ": closing");
> maxNonMmappedEvictableLifespanMs = 0;
> maxEvictableMmapedSize = 0;
> // Close and join cacheCleaner thread.
> IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
> // Purge all replicas.
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictable.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictable.get(eldestKey));
> }
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictableMmapped.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
> }
>   } finally {
> lock.unlock();
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()

2020-02-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15161:
---
Attachment: HDFS-15161.001.patch
Status: Patch Available  (was: Open)

> When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException in ShortCircuitCache#close() 
> --
>
> Key: HDFS-15161
> URL: https://issues.apache.org/jira/browse/HDFS-15161
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15161.001.patch
>
>
> detail see HDFS-14541
> {code:java}
> /**
>  * Close the cache and free all associated resources.
>  */
> @Override
> public void close() {
>   try {
> lock.lock();
> if (closed) return;
> closed = true;
> LOG.info(this + ": closing");
> maxNonMmappedEvictableLifespanMs = 0;
> maxEvictableMmapedSize = 0;
> // Close and join cacheCleaner thread.
> IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
> // Purge all replicas.
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictable.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictable.get(eldestKey));
> }
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictableMmapped.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
> }
>   } finally {
> lock.unlock();
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()

2020-02-11 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034469#comment-17034469
 ] 

Lisheng Sun commented on HDFS-15161:


[~ayushtkn]
HDFS-14541 left the the problem that do not throw NoSuchElementException in 
ShortCircuitCache#close(), when evictableMmapped or evictable size is zero. 
other method in ShortCircuitCache has fixed the problem.

> When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException in ShortCircuitCache#close() 
> --
>
> Key: HDFS-15161
> URL: https://issues.apache.org/jira/browse/HDFS-15161
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15161.001.patch
>
>
> detail see HDFS-14541
> {code:java}
> /**
>  * Close the cache and free all associated resources.
>  */
> @Override
> public void close() {
>   try {
> lock.lock();
> if (closed) return;
> closed = true;
> LOG.info(this + ": closing");
> maxNonMmappedEvictableLifespanMs = 0;
> maxEvictableMmapedSize = 0;
> // Close and join cacheCleaner thread.
> IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
> // Purge all replicas.
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictable.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictable.get(eldestKey));
> }
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictableMmapped.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
> }
>   } finally {
> lock.unlock();
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()

2020-02-11 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15161:
---
Attachment: HDFS-15161.002.patch

> When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException in ShortCircuitCache#close() 
> --
>
> Key: HDFS-15161
> URL: https://issues.apache.org/jira/browse/HDFS-15161
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15161.001.patch, HDFS-15161.002.patch
>
>
> detail see HDFS-14541
> {code:java}
> /**
>  * Close the cache and free all associated resources.
>  */
> @Override
> public void close() {
>   try {
> lock.lock();
> if (closed) return;
> closed = true;
> LOG.info(this + ": closing");
> maxNonMmappedEvictableLifespanMs = 0;
> maxEvictableMmapedSize = 0;
> // Close and join cacheCleaner thread.
> IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner);
> // Purge all replicas.
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictable.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictable.get(eldestKey));
> }
> while (true) {
>   Object eldestKey;
>   try {
> eldestKey = evictableMmapped.firstKey();
>   } catch (NoSuchElementException e) {
> break;
>   }
>   purge((ShortCircuitReplica)evictableMmapped.get(eldestKey));
> }
>   } finally {
> lock.unlock();
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-15 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-15172:
--

 Summary: Remove unnecessary  deadNodeDetectInterval in 
DeadNodeDetector#checkDeadNodes()
 Key: HDFS-15172
 URL: https://issues.apache.org/jira/browse/HDFS-15172
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Lisheng Sun
Assignee: Lisheng Sun


Every call to checkDeadNodes() will change the state to IDLE forcing the 
DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, we don't need 
deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-15 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15172:
---
Attachment: HDFS-15172-001.patch
Status: Patch Available  (was: Open)

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-16 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15172:
---
Attachment: HDFS-15172-002.patch

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So, we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-16 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15172:
---
Description: Every call to checkDeadNodes() will change the state to IDLE 
forcing the DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need 
deadNodeDetectInterval between every checkDeadNodes().  (was: Every call to 
checkDeadNodes() will change the state to IDLE forcing the DeadNodeDetector to 
sleep for IDLE_SLEEP_MS. So, we don't need deadNodeDetectInterval between every 
checkDeadNodes().)

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-16 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-15149:
--

Assignee: Lisheng Sun  (was: Ahmed Hussein)

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(Se

[jira] [Commented] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-16 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038073#comment-17038073
 ] 

Lisheng Sun commented on HDFS-15172:


[~elgoiri]
I was intended to add this check In DeadNodeDetector#checkDeadNodes to prevent 
excessive frequency check. But waiting time has been added to idle() and every 
call to checkDeadNodes() will change the state to IDLE.  So there will not is a 
situation that excessive frequency check. 
TestDeadNodeDetection#testDeadNodeDetectionDeadNodeRecovery() has covered about 
this change.  Thank you.

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15174) Optimize ReplicaCachingGetSpaceUsed by reducing unnecessary io operations

2020-02-16 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-15174:
--

 Summary: Optimize ReplicaCachingGetSpaceUsed by reducing 
unnecessary io operations
 Key: HDFS-15174
 URL: https://issues.apache.org/jira/browse/HDFS-15174
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lisheng Sun
Assignee: Lisheng Sun


Calculating the size of each block and the size of the meta file requires io 
operation In ReplicaCachingGetSpaceUsed#refresh(). Pressure on disk performance 
when there are many block. HDFS-14313 is intended to reduce io operation. So 
get block size by ReplicaInfo and meta size by DataChecksum#getChecksumSize().
{code:java}
@Override
  protected void refresh() {
  if (CollectionUtils.isNotEmpty(replicaInfos)) {
for (ReplicaInfo replicaInfo : replicaInfos) {
  if (Objects.equals(replicaInfo.getVolume().getStorageID(),
  volume.getStorageID())) {
dfsUsed += replicaInfo.getBlockDataLength();
dfsUsed += replicaInfo.getMetadataLength();
count++;
  }
}
  }
  }
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-17 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15149:
---
Attachment: HDFS-15149-001.patch
Status: Patch Available  (was: Open)

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSele

[jira] [Commented] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()

2020-02-17 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038721#comment-17038721
 ] 

Lisheng Sun commented on HDFS-15172:


Thank [~elgoiri] for review.
This jira solves the problem that that excessive frequency check. HDFS-15149 
should not bring this part of the modification and is surpose to solve that 
DeadNodeDetector surpresses all interrupts and never checks for a termination 
flag. I think these two problem is better divided into two jiras. 

> Remove unnecessary  deadNodeDetectInterval in 
> DeadNodeDetector#checkDeadNodes()
> ---
>
> Key: HDFS-15172
> URL: https://issues.apache.org/jira/browse/HDFS-15172
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch
>
>
> Every call to checkDeadNodes() will change the state to IDLE forcing the 
> DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need 
> deadNodeDetectInterval between every checkDeadNodes().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15182) TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk

2020-02-18 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-15182:
--

 Summary: TestBlockManager#testOneOfTwoRacksDecommissioned() fail 
in trunk
 Key: HDFS-15182
 URL: https://issues.apache.org/jira/browse/HDFS-15182
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lisheng Sun


when run only a UT of TestBlockManager#testOneOfTwoRacksDecommissioned(), it 
will fail and throw NullPointerException.
Since NameNode#metrics is static variable,run all uts in TestBlockManager and 
other ut has init metrics.
But  that it runs only testOneOfTwoRacksDecommissioned without initialing 
metrics throws NullPointerException.
{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4088)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.fulfillPipeline(TestBlockManager.java:518)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestOneOfTwoRacksDecommissioned(TestBlockManager.java:388)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testOneOfTwoRacksDecommissioned(TestBlockManager.java:353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15182) TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk

2020-02-18 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15182:
---
Attachment: HDFS-15182-001.patch
Status: Patch Available  (was: Open)

> TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk
> 
>
> Key: HDFS-15182
> URL: https://issues.apache.org/jira/browse/HDFS-15182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-15182-001.patch
>
>
> when run only a UT of TestBlockManager#testOneOfTwoRacksDecommissioned(), it 
> will fail and throw NullPointerException.
> Since NameNode#metrics is static variable,run all uts in TestBlockManager and 
> other ut has init metrics.
> But  that it runs only testOneOfTwoRacksDecommissioned without initialing 
> metrics throws NullPointerException.
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4088)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.fulfillPipeline(TestBlockManager.java:518)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestOneOfTwoRacksDecommissioned(TestBlockManager.java:388)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testOneOfTwoRacksDecommissioned(TestBlockManager.java:353)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15182) TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk

2020-02-19 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15182:
---
Attachment: HDFS-15182-002.patch

> TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk
> 
>
> Key: HDFS-15182
> URL: https://issues.apache.org/jira/browse/HDFS-15182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-15182-001.patch, HDFS-15182-002.patch
>
>
> when run only a UT of TestBlockManager#testOneOfTwoRacksDecommissioned(), it 
> will fail and throw NullPointerException.
> Since NameNode#metrics is static variable,run all uts in TestBlockManager and 
> other ut has init metrics.
> But  that it runs only testOneOfTwoRacksDecommissioned without initialing 
> metrics throws NullPointerException.
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4088)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.fulfillPipeline(TestBlockManager.java:518)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestOneOfTwoRacksDecommissioned(TestBlockManager.java:388)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testOneOfTwoRacksDecommissioned(TestBlockManager.java:353)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15182) TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk

2020-02-19 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039993#comment-17039993
 ] 

Lisheng Sun commented on HDFS-15182:


the v002 fixed it in @Before.

> TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk
> 
>
> Key: HDFS-15182
> URL: https://issues.apache.org/jira/browse/HDFS-15182
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-15182-001.patch, HDFS-15182-002.patch
>
>
> when run only a UT of TestBlockManager#testOneOfTwoRacksDecommissioned(), it 
> will fail and throw NullPointerException.
> Since NameNode#metrics is static variable,run all uts in TestBlockManager and 
> other ut has init metrics.
> But  that it runs only testOneOfTwoRacksDecommissioned without initialing 
> metrics throws NullPointerException.
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4088)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.fulfillPipeline(TestBlockManager.java:518)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestOneOfTwoRacksDecommissioned(TestBlockManager.java:388)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testOneOfTwoRacksDecommissioned(TestBlockManager.java:353)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-19 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040025#comment-17040025
 ] 

Lisheng Sun commented on HDFS-15149:


Thank [~ahussein] for your suggestion.
{quote}
The poll period and waiting time (5000 and 10) in waitFoDeadNode is very 
large. I assume you had to use large numbers to match the delays of the 
detector threads.
{quote}
The poll period and waiting time are indeed too long and i can reduce them.
{quote}
I have a question about clearAndGetDetectedDeadNodes(): As far as I understand 
Calling the method in a loop means that a "deadnode" can be removed from the 
deadNodes map. In other words, the count may never reach 3, because the map 
does not for the removed nodes from the list. Please feel free to correct my 
understanding of the code if I am wrong.
{quote}
the method of clearAndGetDetectedDeadNodes() 's purpose is return the new 
deadNodes that don't include dead node which is not used by any DFSInputStream.
{quote}
IMHO, DeadNodeDetector.java needs to introduce more aggressive mechanisms to 
coordinate between the threads. Instead of just racing between each other, 
tasks can use conditional variables to communicate like synchronized queues, or 
object monitors. Another benefit from using conditional variables is that the 
runtime of the tests will be improved because there won't be need to wait for a 
full cycle.
The DefaultSpeculator.java has a synchronized queue just for the purpose of 
testing: "DefaultSpeculator.scanControl".
{quote}
Based on your good suggestion i will optimize it. Thank you.




> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=3

[jira] [Commented] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-19 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040027#comment-17040027
 ] 

Lisheng Sun commented on HDFS-15149:


hi [~elgoiri]
{quote}
I like the rest of the solution though.
{quote}
what refers to the rest of the solution though?

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper

[jira] [Updated] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-02-19 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15149:
---
Attachment: HDFS-15149-002.patch

> TestDeadNodeDetection test cases time-out
> -
>
> Key: HDFS-15149
> URL: https://issues.apache.org/jira/browse/HDFS-15149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-15149-001.patch, HDFS-15149-002.patch
>
>
> TestDeadNodeDetection JUnit time out times out with the following stack 
> traces:
> * 1- testDeadNodeDetectionInBackground*
> {code:bash}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
> [ERROR] 
> testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection)
>   Time elapsed: 125.806 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-01-24 08:31:07,023
> "client DomainSocketWatcher" daemon prio=5 tid=117 runnable
> java.lang.Thread.State: RUNNABLE
> at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native 
> Method)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
> at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
> at java.lang.Thread.run(Thread.java:748)
> "Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
> queue]" daemon prio=5 tid=752 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
> "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.TimerThread.mainLoop(Timer.java:552)
> at java.util.TimerThread.run(Timer.java:505)
> "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
>  daemon prio=5 tid=385 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
> at java.lang.Thread.run(Thread.java:748)
> "qtp164757726-349" daemon prio=5 tid=349 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>

  1   2   3   4   5   6   7   8   9   10   >