[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2021-05-07 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340787#comment-17340787
 ] 

lei w commented on HDFS-14986:
--

hi [~Aiphag0] , Could you explain how you solve the deadlock problem in briefly.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2021-05-07 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340751#comment-17340751
 ] 

lei w commented on HDFS-14986:
--

hi [~Aiphag0] ,Will hadoop 3.1.2 also appear deadlock as you said above ? 
https://issues.apache.org/jira/browse/HDFS-14986?focusedCommentId=16973314=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16973314

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2021-05-07 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340750#comment-17340750
 ] 

lei w commented on HDFS-14986:
--

hi [~Aiphag0] , Will hadoop 3.1.2 also appear deadlock as you said above 
?[comment-16973314](https://issues.apache.org/jira/browse/HDFS-14986?focusedCommentId=16973314=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16973314)

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2021-05-07 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340749#comment-17340749
 ] 

lei w commented on HDFS-14986:
--

hi [~Aiphag0] , hadoop 3.1.2 also appear deadlock as you said above 
[comment-16973314](https://issues.apache.org/jira/browse/HDFS-14986?focusedCommentId=16973314=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16973314)

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2021-05-07 Thread lei w (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340727#comment-17340727
 ] 

lei w commented on HDFS-14986:
--

hi Aiphag0 , deadlock will appear in hadoop3.x ?

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991911#comment-16991911
 ] 

Jonathan Hung commented on HDFS-14986:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-28 Thread fanghanyun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984766#comment-16984766
 ] 

fanghanyun commented on HDFS-14986:
---

hadoop version 2.6.0-cdh5.13.1

!image-2019-11-29-14-59-11-179.png!

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1, 2.11.0
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984170#comment-16984170
 ] 

Aiphago commented on HDFS-14986:


Thanks a lot for the review [~linyiqun].

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1, 2.11.0
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984102#comment-16984102
 ] 

Hudson commented on HDFS-14986:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17705 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17705/])
HDFS-14986. ReplicaCachingGetSpaceUsed throws (yqlin: rev 
2b452b4e6063072b2bec491edd3f412eb7ac21f3)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CachingGetSpaceUsed.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestReplicaCachingGetSpaceUsed.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaCachingGetSpaceUsed.java


> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1, 2.11.0
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984096#comment-16984096
 ] 

Yiqun Lin commented on HDFS-14986:
--

LGTM, +1. Committing this.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983808#comment-16983808
 ] 

Hadoop QA commented on HDFS-14986:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 33s{color} | {color:orange} root: The patch generated 1 new + 117 unchanged 
- 0 fixed = 118 total (was 117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
49s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}200m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14986 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986950/HDFS-14986.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7e4938caed7c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7f2ea2a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983513#comment-16983513
 ] 

Aiphago commented on HDFS-14986:


Good advice,I change the retrytimes to 10 and close the stream in while 
loop.[^HDFS-14986.006.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983466#comment-16983466
 ] 

Hadoop QA commented on HDFS-14986:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 32m  
7s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m  
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 17s{color} | {color:orange} root: The patch generated 1 new + 117 unchanged 
- 0 fixed = 118 total (was 117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 48s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m  0s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 5s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}275m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14986 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986884/HDFS-14986.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1b4d8f61cf6c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c8bef4d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 

[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-27 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983455#comment-16983455
 ] 

Yiqun Lin commented on HDFS-14986:
--

[~Aiphag0], the change looks good to me. I rerun the unit test in my local, I 
find it's hard to reproduce CME exception based on 3 retry attempts. Can you 
increase the times from 3 to 10? I can see the CME error based on 10 times 
without lock change.
{code}
int retryTimes = 10;
{code}

In addition, there is a resource leak that I missed before. Can you move close 
operation into while loop?
{code:java}
public void run() {
  FSDataOutputStream os = null;
  while (shouldRun) {
try {
  int id = RandomUtils.nextInt();
  os = fs.create(new Path("/testFsDatasetImplDeepCopyReplica/" + id));
  byte[] bytes = new byte[2048];
  InputStream is = new ByteArrayInputStream(bytes);
  IOUtils.copyBytes(is, os, bytes.length);
  os.hsync();
  os.close();   <=== move here
} catch (IOException e) {}
  }

  try {
fs.delete(new Path("/testFsDatasetImplDeepCopyReplica"), true);
  } catch (IOException e) {}
}
{code}
 
Others looks good to me, :).

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983213#comment-16983213
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.I improve the patch as your 
comment.Addition I rename shouldInitRefresh to shouldFirstRefresh order to easy 
distinguish.[^HDFS-14986.005.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983153#comment-16983153
 ] 

Yiqun Lin commented on HDFS-14986:
--

Comment makes sense to me. I have minor suggestion for {{shouldInitRefresh}}. 

The meaning of shouldInitRefresh should be that 'should do init refresh 
operation', so the default value should be true rather than false. So I prefer 
to change this and add comment:

{code}
   void init() {
 if (used.get() < 0) {
   used.set(0);
// Skip initial refresh operation, so we need to do first refresh 
// operation immediately in refresh thread.
+  if (!shouldInitRefresh) {
+initRefeshThread(true);
+return;
+  }
   refresh();
 }
+initRefeshThread(false);
+  }
{code}

Can we move this method into CachingGetSpaceUsed? The variable is under 
CachingGetSpaceUsed, it will be better also put the set operation in this class 
and let sub-classes to use this.
{code}
+  /**
+   * Reset that if we need to do the initial refresh.
+   * @param shouldInitRefresh The flag value to set.
+   */
+  protected void setShouldInitRefresh(boolean shouldInitRefresh) {
+this.shouldInitRefresh = shouldInitRefresh;
+  }
{code}

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-22 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980042#comment-16980042
 ] 

Aiphago commented on HDFS-14986:


Hi 
!https://issues.apache.org/jira/secure/useravatar?size=xsmall=linyiqun=25258!
 [~linyiqun],I get your idea.But this may have a small problem.Because when use 
FSCachingGetSpaceUsed invoke init() and when 

used < 0,so pass the frist refresh().This will make used = 0 until default 10 
min until next refresh().So I add runImmediately to slove this problem.And to 
prevent other subClass not refresh() twice at first du.So this is my idea in 
patch 004.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-22 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980017#comment-16980017
 ] 

Yiqun Lin commented on HDFS-14986:
--

Hi [~Aiphag0], the latest patch can still be simplified. Actually we don't need 
to add {{runImmediately}} for RefreshThread. Only one thing we need to do is 
that we skip init refresh in FSCachingGetSpaceUsed.  Hope you have got my idea.
 So the change can simplified to this
{noformat}
--- 
a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CachingGetSpaceUsed.java
+++ 
b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CachingGetSpaceUsed.java
@@ -47,6 +47,7 @@
   private final long jitter;
   private final String dirPath;
   private Thread refreshUsed;
+  private boolean shouldInitRefresh;

   /**
* This is the constructor used by the builder.
@@ -79,12 +80,15 @@ public CachingGetSpaceUsed(CachingGetSpaceUsed.Builder 
builder)
 this.refreshInterval = interval;
 this.jitter = jitter;
 this.used.set(initialUsed);
+this.shouldInitRefresh = true;
   }

   void init() {
 if (used.get() < 0) {
   used.set(0);
-  refresh();
+  if (shouldInitRefresh) {
+   refresh();
+  }
 }

 if (refreshInterval > 0) {
@@ -145,6 +149,14 @@ protected void setUsed(long usedValue) {
 this.used.set(usedValue);
   }

+  /**
+   * Reset that if we need to do the initial refresh.
+   * @param shouldInitRefresh The flag value to set.
+   */
+  protected void setShouldInitRefresh(boolean shouldInitRefresh) {
+this.shouldInitRefresh = shouldInitRefresh;
+  }
+
   @Override
   public void close() throws IOException {
 running.set(false);
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java
 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java
index a5f350860f3..0a922b2d22f 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSCachingGetSpaceUsed.java
@@ -39,6 +39,7 @@

   public FSCachingGetSpaceUsed(Builder builder) throws IOException {
 super(builder);
+this.setShouldInitRefresh(false);
   }
{noformat}
 

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979994#comment-16979994
 ] 

Hadoop QA commented on HDFS-14986:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 37s{color} | {color:orange} root: The patch generated 2 new + 118 unchanged 
- 0 fixed = 120 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 31s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 14s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}233m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.security.TestRaceWhenRelogin |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14986 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986493/HDFS-14986.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ae856fa84a00 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b25e94c |
| maven | version: Apache 

[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-21 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979853#comment-16979853
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.And now 
setShouldInitRefresh() is only invoke in 
ReplicaCachingGetSpaceUsed.[^HDFS-14986.004.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979036#comment-16979036
 ] 

Hadoop QA commented on HDFS-14986:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
59s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 34s{color} | {color:orange} root: The patch generated 3 new + 117 unchanged 
- 0 fixed = 120 total (was 117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 30s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 55s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
56s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}224m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.security.TestRaceWhenRelogin |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14986 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986378/HDFS-14986.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7a09c1a83624 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| 

[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979006#comment-16979006
 ] 

Yiqun Lin commented on HDFS-14986:
--

Hi [~Aiphag0],
{code:java}
+  // Prevent ReplicaCachingGetSpaceUsed dead lock at 
FsDatasetImpl#deepCopyReplica
+  if 
(this.getClass().getSimpleName().equals("ReplicaCachingGetSpaceUsed")) {
+initRefeshThread(true);
+return;
+  }
{code}
Judgement for the subclass in parent class is a little tricky, we can add a new 
flag and reset that in subclass. I find a way to remove runImmediately for 
RefreshThreaad and add {{shouldInitRefresh = true}} in CachingGetSpaceUsed. By 
default, we will do init refresh.
{code:java}
if (used.get() < 0) {
  used.set(0);
  if(shouldInitRefresh) {
  refresh();
  }
}
{code}
And we need to define a protected method to override this only in 
FSCachingGetSpaceUsed.
{code:java}
  /**
   * Reset that if we need to do the initial refresh.
   * @param shouldInitRefresh The flag value to set.
   */
  protected void setShouldInitRefresh(boolean shouldInitRefresh) {
  this.shouldInitRefresh = shouldInitRefresh;
  }

  public FSCachingGetSpaceUsed(Builder builder) throws IOException {
super(builder);
this.setShouldInitRefresh(false);
  }
{code}
 

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Ryan Wu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978976#comment-16978976
 ] 

Ryan Wu commented on HDFS-14986:


Hi Aiphago, thanks for providing the patch that fix me too. I will backport it. 

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978951#comment-16978951
 ] 

Aiphago commented on HDFS-14986:


hi [~jianliang.wu], I just assign this Jira to myself, please feel free to 
assign back if you would also like to work on this.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978950#comment-16978950
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.In previous patch I modify 
CachingGetSpaceUsed#init(),but this will influences the subclass of 
CachingGetSpaceUsed like DU.So I add a filter,and now the related unit tests 
can pass.[^HDFS-14986.003.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-20 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978509#comment-16978509
 ] 

Yiqun Lin commented on HDFS-14986:
--

Hi [~Aiphag0], the patch almost looks good to me. Only some minor comments for 
the unit test, I think it's enough that we just verify deepCopyReplica call 
won't throw CME.

So while loop in unit test can simplified to this:
{code:java}
int retryTimes = 3;
while (retryTimes > 0) {
try {
Set replicas = spyDataset.deepCopyReplica(bpid);
if (replicas.size() > 0) {
retryTimes--;
  // DeepCopyReplica should not throw 
ConcurrentModificationException
  modifyThread.setShouldRun(false);
}
  } catch (IOException e) {
modifyThread.setShouldRun(false);
Assert.fail("Encounter IOException when deep copy replica.");
  }
}
modifyThread.setShouldRun(false);
{code}
 
 In additional, we need to close the fs output stream and delete created test 
files in ModifyThread. Can you update as following?
{code:java}
public void run() {
  FSDataOutputStream os = null;
  while (shouldRun) {
try {
  int id = RandomUtils.nextInt();
  os = fs.create(new Path("/testFsDatasetImplDeepCopyReplica/" + id));
  byte[] bytes = new byte[2048];
  InputStream is = new ByteArrayInputStream(bytes);
  IOUtils.copyBytes(is, os, bytes.length);
  os.hsync();
} catch (IOException e) {
  // ignored exception
}
  }

  try {
if (os != null) {
   os.close();
}
fs.delete(new Path("/testFsDatasetImplDeepCopyReplica"), true);
} catch (IOException e) {
}
}{code}

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977804#comment-16977804
 ] 

Hadoop QA commented on HDFS-14986:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m  7s{color} | {color:orange} root: The patch generated 1 new + 117 unchanged 
- 0 fixed = 118 total (was 117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 44s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}244m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.TestDFCachingGetSpaceUsed |
|   | hadoop.security.TestFixKerberosTicketOrder |
|   | hadoop.fs.TestDU |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14986 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986219/HDFS-14986.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 70d35e564673 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-19 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977328#comment-16977328
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],I updata the patch.Can you review this 
again?[^HDFS-14986.002.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-17 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976301#comment-16976301
 ] 

Aiphago commented on HDFS-14986:


Thanks for your advice, I'll fix later.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-16 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975680#comment-16975680
 ] 

Yiqun Lin commented on HDFS-14986:
--

Thanks for providing the patch, [~Aiphag0]. I like the fix way. Only minor 
comments:
{quote}RefreshThread(CachingGetSpaceUsed spaceUsed,boolean runImmediately)
{quote}
Can you add one space between {{spaceUsed,boolean}}?

Can you add the comment for the method deepCopyReplica in FsDatasetSpi?
{code:java}
  /**
   * Deep copy the replica info belonging to given block pool.
   * @param bpid Specified block pool id.
   * @return A set of replica info.
   * @throws IOException
   */
  Set deepCopyReplica(String bpid) throws IOException;
{code}
And then remove following outdated comment in FsDatasetImpl?
{noformat}
  /**
   * The deepCopyReplica call doesn't use the datasetock since it will lead the
   * potential deadlock with the {@link FsVolumeList#addBlockPool} call.
   */
{noformat}
Additionally, can you add a unit test for this? It will be a good habit when we 
fix a bug then give a corresponding unit test case.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-14 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974888#comment-16974888
 ] 

Aiphago commented on HDFS-14986:


Here is the patch for the trunk can [~leosun08] [~hexiaoqiao] [~linyiqun] 
review this one? Thanks very much.[^HDFS-14986.001.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Attachments: HDFS-14986.001.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973916#comment-16973916
 ] 

Aiphago commented on HDFS-14986:


I means I can fix the dead lock problem with add dataset lock,my first comment 
have say the solution.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973910#comment-16973910
 ] 

Lisheng Sun commented on HDFS-14986:


[~Aiphag0]

if add dataset lock at FsDatasetImpl#deepCopyReplica in trunk,  it will happen 
dead lock.

I don't use FsDatasetImpl#datasetLock , Because FsDatasetImpl#addBlockPool with 
datasetLock call FsDatasetImpl#deepCopyReplica in another Thread when dn start.

https://issues.apache.org/jira/browse/HDFS-14313?focusedCommentId=16887859=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16887859

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973906#comment-16973906
 ] 

Aiphago commented on HDFS-14986:


We use the branch before 2.8 with the synchronized,and we fix the deadlock 
problem.And I think it's better to and add dataset lock at 
FsDatasetImpl#deepCopyReplica in trunk,and the method to slove deadlock problem 
is the same.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973449#comment-16973449
 ] 

Lisheng Sun commented on HDFS-14986:


[~Aiphag0]  which branch could you use?

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973446#comment-16973446
 ] 

Lisheng Sun commented on HDFS-14986:


FsDatasetImpl#deepCopyReplica should be synchronized before branch-2.8.  
HDFS-10682  Replace FsDatasetImpl object lock with a separate lock objectin 
branch-2.8.
{code:java}
public Set deepCopyReplica(String bpid){
...
}
{code}
 

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973339#comment-16973339
 ] 

Xiaoqiao He commented on HDFS-14986:


[~linyiqun], we did some modify when backport HDFS-14313 since there are very 
diffs between branch-2.7 and trunk.
As [~Aiphag0] said above, if no synchronous control, it will throw CME, but 
meet deadlock if add sync when restart DataNode. The root cause is 
`BlockPoolSlice#loadDfsUsed() may return -1. And the 
ReplicaCachingGetSpaceUsed#init() will blocks` when start as mentioned above.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973321#comment-16973321
 ] 

Aiphago commented on HDFS-14986:


"DataNode: [... #283 daemon prio=5 os_prio=0 tid=0x7fd9826a7800 
nid=0x7463 in Object.wait() [0x7fd949616000]
 java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x0006b375a1d0> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2)
 at java.lang.Thread.join(Thread.java:1249)
 - locked <0x0006b375a1d0> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2)
 at java.lang.Thread.join(Thread.java:1323)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:423)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:2509)
 - locked <0x0006b33a3b10> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1388)
 at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:232)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:720)

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973317#comment-16973317
 ] 

Aiphago commented on HDFS-14986:


I would like to submit a patch to try to fix this bug later.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973314#comment-16973314
 ] 

Aiphago commented on HDFS-14986:


In -HDFS-14313- we found same concurrent problem before base our hadoop 
version.So we solve it by add dataset lock at FsDatasetImpl#deepCopyReplica().  
But this may cause deadlock problem when restart datanode.When BlockPoolSlice 
init first time,the BlockPoolSlice#loadDfsUsed() may return -1.And the 
ReplicaCachingGetSpaceUsed#init() will blocking because of 
FsDatasetImpl#deepCopyReplica() can't get the dataset lock,and the thread can't 
release dataset lock the same time.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973312#comment-16973312
 ] 

Yiqun Lin commented on HDFS-14986:
--

+[~leosun08] to have a look for this as well.
{quote}we meet very similar case (ConcurrentModificationException or 
DeadLock)...
{quote}
[~hexiaoqiao], I wonder how you find the case of deadlock, I remember 
[~leosun08] and me talked about the deadlock problem in HDFS-14313 and should 
not happen. It will be great if you can provide the stack info for the deadlock.

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973305#comment-16973305
 ] 

Xiaoqiao He commented on HDFS-14986:


we meet very similar case (ConcurrentModificationException or DeadLock) when 
backport HDFS-14313 to branch-2.7, and I believe there is still some space to 
improve current implementation.
[~Aiphag0] may have some more information. Would you like to offer our patch 
here?

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh 
> errorjava.util.ConcurrentModificationException: Tree has been modified 
> outside of iteratorat 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)at 
> java.util.HashSet.(HashSet.java:120)at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-13 Thread Ryan Wu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973288#comment-16973288
 ] 

Ryan Wu commented on HDFS-14986:


The DeepCopyReplica method gets FoldedTreeSet from ReplicaMap,at 
the same time the FoldedTreeSet could be changed due to addblock 
or removeblock. This ConcurrentModificationException appears .

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992-10.208.50.21-1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh 
> errorjava.util.ConcurrentModificationException: Tree has been modified 
> outside of iteratorat 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)at 
> java.util.HashSet.(HashSet.java:120)at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org