[jira] [Updated] (HDFS-11573) Support rename between different NameNodes in federated HDFS
[ https://issues.apache.org/jira/browse/HDFS-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-11573: Flags: Patch > Support rename between different NameNodes in federated HDFS > > > Key: HDFS-11573 > URL: https://issues.apache.org/jira/browse/HDFS-11573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: federation >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-11573-2.6.5-001.patch, HDFS_federation_rename.pdf > > >Federated file system can improve overall scalability by dividing a > single namespace into multiple sub-namespaces. Since the divided > sub-namespace are held by different namenodes, rename operation between those > sub-namespace is forbidden. Due to this restriction, many applications have > to be rewritten to work around the issue after migrated to federated file > system. Supporting rename between different namenodes would make it much > easier to migrate to federated file systems. >We have finished a preliminary implementation of this feature in a 2.6 > branch. I'll upload a write-up regarding the design in a few days. And then > I'll re-org the code against the trunk and upload the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552368#comment-16552368 ] zhouyingchao commented on HDFS-10240: - Hi Wei-Chiu, [~LiJinglun] who figured out the issue together with me has free time these days, could you please help to re-assign the issue to him? Looks like I cannot re-assign the issue ... > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, > HDFS-10240.test.patch > > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.44:11402 by /10.114.5.44 because
[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552301#comment-16552301 ] zhouyingchao commented on HDFS-10240: - Thank you, Wei-Chiu. I'd like to work on this issue. I'll add more tests in a few days. > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao >Priority: Major > Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, > HDFS-10240.test.patch > > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > From
[jira] [Commented] (HDFS-13700) The process of loading image can be done in a pipeline model
[ https://issues.apache.org/jira/browse/HDFS-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523522#comment-16523522 ] zhouyingchao commented on HDFS-13700: - Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 300million blocks, the fsimage is around 22GB), the image loading time be reduced from 1210 seconds to 739 seconds. > The process of loading image can be done in a pipeline model > > > Key: HDFS-13700 > URL: https://issues.apache.org/jira/browse/HDFS-13700 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-13700-001.patch > > > The process of loading a file system image involves reading inodes section, > deserializing inodes, initializing inodes, adding inodes to the global map, > reading directories section, adding inodes to their parents' map, cache name > etc. These steps can be done in a pipeline model to reduce the total > duration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13700) The process of loading image can be done in a pipeline model
[ https://issues.apache.org/jira/browse/HDFS-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-13700: Attachment: HDFS-13700-001.patch Status: Patch Available (was: Open) > The process of loading image can be done in a pipeline model > > > Key: HDFS-13700 > URL: https://issues.apache.org/jira/browse/HDFS-13700 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-13700-001.patch > > > The process of loading a file system image involves reading inodes section, > deserializing inodes, initializing inodes, adding inodes to the global map, > reading directories section, adding inodes to their parents' map, cache name > etc. These steps can be done in a pipeline model to reduce the total > duration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13700) The process of loading image can be done in a pipeline model
[ https://issues.apache.org/jira/browse/HDFS-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523520#comment-16523520 ] zhouyingchao commented on HDFS-13700: - [~brahmareddy], thank you for telling me about HDFS-7784. I guess it's kind of the same optimization as I have thought. Since I have finished a patch which implemented a pipeline model for this kind of stuff, I'd like to post the patch here for reference. > The process of loading image can be done in a pipeline model > > > Key: HDFS-13700 > URL: https://issues.apache.org/jira/browse/HDFS-13700 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Priority: Major > > The process of loading a file system image involves reading inodes section, > deserializing inodes, initializing inodes, adding inodes to the global map, > reading directories section, adding inodes to their parents' map, cache name > etc. These steps can be done in a pipeline model to reduce the total > duration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13700) The process of loading image can be done in a pipeline model
zhouyingchao created HDFS-13700: --- Summary: The process of loading image can be done in a pipeline model Key: HDFS-13700 URL: https://issues.apache.org/jira/browse/HDFS-13700 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhouyingchao The process of loading a file system image involves reading inodes section, deserializing inodes, initializing inodes, adding inodes to the global map, reading directories section, adding inodes to their parents' map, cache name etc. These steps can be done in a pipeline model to reduce the total duration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13694) Making md5 computing being in parallel with image loading
[ https://issues.apache.org/jira/browse/HDFS-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-13694: Attachment: HDFS-13694-001.patch Status: Patch Available (was: Open) Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 300million blocks), the image loading time be reduced from 1210 seconds to 1105 seconds. > Making md5 computing being in parallel with image loading > - > > Key: HDFS-13694 > URL: https://issues.apache.org/jira/browse/HDFS-13694 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-13694-001.patch > > > During namenode image loading, it firstly compute the md5 and then load the > image. Actually these two steps can be in parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13694) Making md5 computing being in parallel with image loading
[ https://issues.apache.org/jira/browse/HDFS-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-13694: Attachment: (was: HDFS-13694-001.patch) > Making md5 computing being in parallel with image loading > - > > Key: HDFS-13694 > URL: https://issues.apache.org/jira/browse/HDFS-13694 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Priority: Major > > During namenode image loading, it firstly compute the md5 and then load the > image. Actually these two steps can be in parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13694) Making md5 computing being in parallel with image loading
[ https://issues.apache.org/jira/browse/HDFS-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-13694: Attachment: HDFS-13694-001.patch > Making md5 computing being in parallel with image loading > - > > Key: HDFS-13694 > URL: https://issues.apache.org/jira/browse/HDFS-13694 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-13694-001.patch > > > During namenode image loading, it firstly compute the md5 and then load the > image. Actually these two steps can be in parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13694) Making md5 computing being in parallel with image loading
zhouyingchao created HDFS-13694: --- Summary: Making md5 computing being in parallel with image loading Key: HDFS-13694 URL: https://issues.apache.org/jira/browse/HDFS-13694 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhouyingchao During namenode image loading, it firstly compute the md5 and then load the image. Actually these two steps can be in parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading
[ https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519897#comment-16519897 ] zhouyingchao commented on HDFS-13693: - Test the patch against a fsimage of a 70PB 2.4 cluster (200million files and 300million blocks), the image loading time be reduced from 1210 seconds to 1138 seconds. > Remove unnecessary search in INodeDirectory.addChild during image loading > - > > Key: HDFS-13693 > URL: https://issues.apache.org/jira/browse/HDFS-13693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-13693-001.patch > > > In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added > to their parent INode's map one by one. The adding procedure will search a > position in the parent's map and then insert the child to the position. > However, during image loading, the search is unnecessary since the insert > position should always be at the end of the map given the sequence they are > serialized on disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading
[ https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-13693: Attachment: HDFS-13693-001.patch Status: Patch Available (was: Open) Run all hdfs related unit tests and does not introduce new failures. > Remove unnecessary search in INodeDirectory.addChild during image loading > - > > Key: HDFS-13693 > URL: https://issues.apache.org/jira/browse/HDFS-13693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: zhouyingchao >Priority: Major > Attachments: HDFS-13693-001.patch > > > In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added > to their parent INode's map one by one. The adding procedure will search a > position in the parent's map and then insert the child to the position. > However, during image loading, the search is unnecessary since the insert > position should always be at the end of the map given the sequence they are > serialized on disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading
zhouyingchao created HDFS-13693: --- Summary: Remove unnecessary search in INodeDirectory.addChild during image loading Key: HDFS-13693 URL: https://issues.apache.org/jira/browse/HDFS-13693 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: zhouyingchao In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added to their parent INode's map one by one. The adding procedure will search a position in the parent's map and then insert the child to the position. However, during image loading, the search is unnecessary since the insert position should always be at the end of the map given the sequence they are serialized on disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11573) Support rename between different NameNodes in federated HDFS
[ https://issues.apache.org/jira/browse/HDFS-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-11573: Attachment: HDFS-11573-2.6.5-001.patch Since we implemented the feature on a 2.6 branch, it is easy to merge the code with 2.6 branch. I upload the patch against 2.6.5 branch first. If necessary, I can merge the code with the 3.0 branch in future. > Support rename between different NameNodes in federated HDFS > > > Key: HDFS-11573 > URL: https://issues.apache.org/jira/browse/HDFS-11573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: federation >Reporter: zhouyingchao > Attachments: HDFS-11573-2.6.5-001.patch, HDFS_federation_rename.pdf > > >Federated file system can improve overall scalability by dividing a > single namespace into multiple sub-namespaces. Since the divided > sub-namespace are held by different namenodes, rename operation between those > sub-namespace is forbidden. Due to this restriction, many applications have > to be rewritten to work around the issue after migrated to federated file > system. Supporting rename between different namenodes would make it much > easier to migrate to federated file systems. >We have finished a preliminary implementation of this feature in a 2.6 > branch. I'll upload a write-up regarding the design in a few days. And then > I'll re-org the code against the trunk and upload the patch. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11573) Support rename between different NameNodes in federated HDFS
[ https://issues.apache.org/jira/browse/HDFS-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-11573: Attachment: HDFS_federation_rename.pdf Upload a write-up regarding the feature. > Support rename between different NameNodes in federated HDFS > > > Key: HDFS-11573 > URL: https://issues.apache.org/jira/browse/HDFS-11573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: federation >Reporter: zhouyingchao > Attachments: HDFS_federation_rename.pdf > > >Federated file system can improve overall scalability by dividing a > single namespace into multiple sub-namespaces. Since the divided > sub-namespace are held by different namenodes, rename operation between those > sub-namespace is forbidden. Due to this restriction, many applications have > to be rewritten to work around the issue after migrated to federated file > system. Supporting rename between different namenodes would make it much > easier to migrate to federated file systems. >We have finished a preliminary implementation of this feature in a 2.6 > branch. I'll upload a write-up regarding the design in a few days. And then > I'll re-org the code against the trunk and upload the patch. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11895) Committed block should be completed during block report if usable replicas are enough
[ https://issues.apache.org/jira/browse/HDFS-11895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-11895: Status: Patch Available (was: Open) > Committed block should be completed during block report if usable replicas > are enough > - > > Key: HDFS-11895 > URL: https://issues.apache.org/jira/browse/HDFS-11895 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Priority: Minor > Attachments: HDFS-11895-001.patch > > > In a 2.4 HDFS cluster, we found an issue that a completeFile call failed > since the file's last block's three replica were in decommissioning state. > And finally the decommissioning was stuck. We figured out the issue had been > fixed by HDFS-11499. The fix of HDFS-11499 completes a committed block if > usable replicas are enough in close/recovery code path. Besides that, we > think we'd better complete a committed block in block report path if usable > replicas are engouth. It helps the condition where a client calls > completeFile then exit abnormally (without retry) and all its last block's > replica DNs are in decommissioning state. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11895) Committed block should be completed during block report if usable replicas are enough
[ https://issues.apache.org/jira/browse/HDFS-11895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-11895: Attachment: HDFS-11895-001.patch A patch for the issue. > Committed block should be completed during block report if usable replicas > are enough > - > > Key: HDFS-11895 > URL: https://issues.apache.org/jira/browse/HDFS-11895 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Priority: Minor > Attachments: HDFS-11895-001.patch > > > In a 2.4 HDFS cluster, we found an issue that a completeFile call failed > since the file's last block's three replica were in decommissioning state. > And finally the decommissioning was stuck. We figured out the issue had been > fixed by HDFS-11499. The fix of HDFS-11499 completes a committed block if > usable replicas are enough in close/recovery code path. Besides that, we > think we'd better complete a committed block in block report path if usable > replicas are engouth. It helps the condition where a client calls > completeFile then exit abnormally (without retry) and all its last block's > replica DNs are in decommissioning state. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11895) Committed block should be completed during block report if usable replicas are enough
zhouyingchao created HDFS-11895: --- Summary: Committed block should be completed during block report if usable replicas are enough Key: HDFS-11895 URL: https://issues.apache.org/jira/browse/HDFS-11895 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Priority: Minor In a 2.4 HDFS cluster, we found an issue that a completeFile call failed since the file's last block's three replica were in decommissioning state. And finally the decommissioning was stuck. We figured out the issue had been fixed by HDFS-11499. The fix of HDFS-11499 completes a committed block if usable replicas are enough in close/recovery code path. Besides that, we think we'd better complete a committed block in block report path if usable replicas are engouth. It helps the condition where a client calls completeFile then exit abnormally (without retry) and all its last block's replica DNs are in decommissioning state. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11573) Support rename between different NameNodes in federated HDFS
[ https://issues.apache.org/jira/browse/HDFS-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-11573: Description: Federated file system can improve overall scalability by dividing a single namespace into multiple sub-namespaces. Since the divided sub-namespace are held by different namenodes, rename operation between those sub-namespace is forbidden. Due to this restriction, many applications have to be rewritten to work around the issue after migrated to federated file system. Supporting rename between different namenodes would make it much easier to migrate to federated file systems. We have finished a preliminary implementation of this feature in a 2.6 branch. I'll upload a write-up regarding the design in a few days. And then I'll re-org the code against the trunk and upload the patch. was: Federated file system can improve overall scalability by dividing a single namespace into multiple sub-namespaces. Since the divided sub-namespace is held by different namenodes, rename operation between those sub-namespace is forbidden. Due to this restriction, many applications have to be rewritten to work around the issue after migrated to federated file system. Supporting rename between different namenodes would make it much easier to migrate to federated file systems. We have finished a preliminary implementation of this feature in a 2.6 branch. I'll upload a write-up regarding the design in a few days. And then I'll re-org the code against the trunk and upload the patch. > Support rename between different NameNodes in federated HDFS > > > Key: HDFS-11573 > URL: https://issues.apache.org/jira/browse/HDFS-11573 > Project: Hadoop HDFS > Issue Type: Improvement > Components: federation >Reporter: zhouyingchao > >Federated file system can improve overall scalability by dividing a > single namespace into multiple sub-namespaces. Since the divided > sub-namespace are held by different namenodes, rename operation between those > sub-namespace is forbidden. Due to this restriction, many applications have > to be rewritten to work around the issue after migrated to federated file > system. Supporting rename between different namenodes would make it much > easier to migrate to federated file systems. >We have finished a preliminary implementation of this feature in a 2.6 > branch. I'll upload a write-up regarding the design in a few days. And then > I'll re-org the code against the trunk and upload the patch. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11573) Support rename between different NameNodes in federated HDFS
zhouyingchao created HDFS-11573: --- Summary: Support rename between different NameNodes in federated HDFS Key: HDFS-11573 URL: https://issues.apache.org/jira/browse/HDFS-11573 Project: Hadoop HDFS Issue Type: Improvement Components: federation Reporter: zhouyingchao Federated file system can improve overall scalability by dividing a single namespace into multiple sub-namespaces. Since the divided sub-namespace is held by different namenodes, rename operation between those sub-namespace is forbidden. Due to this restriction, many applications have to be rewritten to work around the issue after migrated to federated file system. Supporting rename between different namenodes would make it much easier to migrate to federated file systems. We have finished a preliminary implementation of this feature in a 2.6 branch. I'll upload a write-up regarding the design in a few days. And then I'll re-org the code against the trunk and upload the patch. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10989) Cannot get last block length after namenode failover
zhouyingchao created HDFS-10989: --- Summary: Cannot get last block length after namenode failover Key: HDFS-10989 URL: https://issues.apache.org/jira/browse/HDFS-10989 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao On a 2.4 cluster, access to a file failed since the last block length cannot be gotten. The fsck output of the file at the moment of failure was like this: /user/X 483600487 bytes, 2 block(s), OPENFORWRITE: MISSING 1 blocks of total size 215165031 B 0. BP-219149063-10.108.84.25-1446859315800:blk_2102504098_1035525341 len=268435456 repl=3 [10.112.17.43:11402, 10.118.22.46:11402, 10.118.22.49:11402] 1. BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-60be75ad-e4a7-4b1e-b3aa-327c85331d42:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-184a1ce9-655a-4e67-b0cc-29ab9984bd0a:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-6d037ac8-4bcc-4cdc-a803-55b1817e0200:NORMAL|RBW]]} len=215165031 MISSING! Recorded locations [10.114.10.14:11402, 10.118.29.3:11402, 10.118.22.42:11402] >From those three data nodes, we found that there were IOException related to >the block and there were pipeline recreating events. We figured out that there was a namenode failover event before the issue happened, and there were some updatePipeline calls to the earlier active namenode: 2016-09-27,15:04:36,437 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092, newGenerationStamp=1036170430, newLength=2624000, newNodes=[10.118.22.42:11402, 10.118.22.49:11402, 10.118.24.3:11402], clientName=DFSClient_NONMAPREDUCE_-442153643_1) 2016-09-27,15:04:36,438 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092) successfully to BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430 2016-09-27,15:10:10,596 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430, newGenerationStamp=1036219054, newLength=17138265, newNodes=[10.118.22.49:11402, 10.118.24.3:11402, 10.114.6.45:11402], clientName=DFSClient_NONMAPREDUCE_-442153643_1) 2016-09-27,15:10:10,601 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430) successfully to BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054 Whereas these new data nodes did not show up in the fsck output. It looks like that when data node recovers pipeline (PIPELINE_SETUP_STREAMING_RECOVERY ), the new data nodes would not call notifyNamingnodeReceivingBlock for the transfered block. >From code review, the issue also exists in more recent branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10240: Attachment: HDFS-10240-001.patch If the patch is ok, I would add some unit tests. > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10240-001.patch > > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > From the log, I guess this is what has happened in order: > 1 Process A open a file F for write. > 2 Somebody else called
[jira] [Updated] (HDFS-10240) Race between close/recoverLease leads to missing block
[ https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10240: Status: Patch Available (was: Open) > Race between close/recoverLease leads to missing block > -- > > Key: HDFS-10240 > URL: https://issues.apache.org/jira/browse/HDFS-10240 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > > We got a missing block in our cluster, and logs related to the missing block > are as follows: > 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 > blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > recovery started, > primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] > 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File XX has not been closed. Lease > recovery is in progress. RecoveryId = 153006357 for block > blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} > 2016-03-28,10:00:06,248 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > has not reached minimal replication 1 > 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.53:11402 is added to > blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} > size 139 > 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size > 139 > 2016-03-28,10:00:08,808 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, > newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, > 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on > 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported > genstamp 153006357 does not match genstamp in block map 153006345 > From the log, I guess this is what has happened in order: > 1 Process A open a file F for write. > 2 Somebody else called recoverLease against F. > 3 A closed F. > The root cause of the missing block is that
[jira] [Created] (HDFS-10240) Race between close/recoverLease leads to missing block
zhouyingchao created HDFS-10240: --- Summary: Race between close/recoverLease leads to missing block Key: HDFS-10240 URL: https://issues.apache.org/jira/browse/HDFS-10240 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao We got a missing block in our cluster, and logs related to the missing block are as follows: 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} recovery started, primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW] 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: File XX has not been closed. Lease recovery is in progress. RecoveryId = 153006357 for block blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]} 2016-03-28,10:00:06,248 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} has not reached minimal replication 1 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.114.5.53:11402 is added to blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]} size 139 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size 139 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size 139 2016-03-28,10:00:08,808 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345, newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false) 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported genstamp 153006357 does not match genstamp in block map 153006345 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported genstamp 153006357 does not match genstamp in block map 153006345 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported genstamp 153006357 does not match genstamp in block map 153006345 >From the log, I guess this is what has happened in order: 1 Process A open a file F for write. 2 Somebody else called recoverLease against F. 3 A closed F. The root cause of the missing block is that recoverLease increased gen count of blocks on datanode whereas the gen count on Namenode is reset by close in step 3. I think we should check if the last block is under recovery when trying to close a file. If so we should just throw an exception to client quickly. Although the issue is found on a 2.4 HDFS, it looks like the issue also exist on the trunk from code.
[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219483#comment-15219483 ] zhouyingchao commented on HDFS-8496: Thank you, Colin! The patch looks good to me. > Calling stopWriter() with FSDatasetImpl lock held may block other threads > - > > Key: HDFS-8496 > URL: https://issues.apache.org/jira/browse/HDFS-8496 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: zhouyingchao >Assignee: Colin Patrick McCabe > Attachments: HDFS-8496-001.patch, HDFS-8496.002.patch > > > On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and > heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By > looking at the stack, we found the calling of stopWriter() with FSDatasetImpl > lock blocked everything. > Following is the heartbeat stack, as an example, to show how threads are > blocked by FSDatasetImpl lock: > {code} >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) > - waiting to lock <0x0007701badc0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) > - locked <0x000770465dc0> (a java.lang.Object) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) > at java.lang.Thread.run(Thread.java:662) > {code} > The thread which held the FSDatasetImpl lock is just sleeping to wait another > thread to exit in stopWriter(). The stack is: > {code} >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1194) > - locked <0x0007636953b8> (a org.apache.hadoop.util.Daemon) > at > org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) > - locked <0x0007701badc0> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:662) > {code} > In this case, we deployed quite a lot other workloads on the DN, the local > file system and disk is quite busy. We guess this is why the stopWriter took > quite a long time. > Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl > lock held. In HDFS-7999, the createTemporary() is changed to call > stopWriter without FSDatasetImpl lock. We guess we should do so in the other > three methods: recoverClose()/recoverAppend/recoverRbw(). > I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10182: Attachment: HDFS-10182-branch26.patch Patch of branch-2.6 > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Fix For: 2.7.3 > > Attachments: HDFS-10182-001.patch, HDFS-10182-branch26.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214159#comment-15214159 ] zhouyingchao commented on HDFS-10182: - Thank you for picking up the fix. I'll upload a patch against 2.6 ASAP. > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Fix For: 2.7.3 > > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10182: Status: Patch Available (was: Open) > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, the passed-in buf from the > caller is passed to another thread to fill in the first attempt. If the > first attempt is timed out, the second attempt would be issued with another > ByteBuffer. Now suppose the second attempt wins and the first attempt is > blocked somewhere in the IO path. The second attempt's result would be copied > to the buf provided by the caller and then caller would think the pread is > all set. Later the caller might use the buf to do something else (for e.g. > read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10182: Description: In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the passed-in buf from the caller is passed to another thread to fill. If the first attempt is timed out, the second attempt would be issued with another ByteBuffer. Now suppose the second attempt wins and the first attempt is blocked somewhere in the IO path. The second attempt's result would be copied to the buf provided by the caller and then caller would think the pread is all set. Later the caller might use the buf to do something else (for e.g. read another chunk of data), however, the first attempt in earlier hedgedFetchBlockByteRange might get some data and fill into the buf ... If this happens, the caller's buf would then be corrupted. To fix the issue, we should allocate a temp buf for the first attempt too. was: In DFSInputStream::hedgedFetchBlockByteRange, the passed-in buf from the caller is passed to another thread to fill in the first attempt. If the first attempt is timed out, the second attempt would be issued with another ByteBuffer. Now suppose the second attempt wins and the first attempt is blocked somewhere in the IO path. The second attempt's result would be copied to the buf provided by the caller and then caller would think the pread is all set. Later the caller might use the buf to do something else (for e.g. read another chunk of data), however, the first attempt in earlier hedgedFetchBlockByteRange might get some data and fill into the buf ... If this happens, the caller's buf would then be corrupted. To fix the issue, we should allocate a temp buf for the first attempt too. > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > ByteBuffer. Now suppose the second attempt wins and the first attempt is > blocked somewhere in the IO path. The second attempt's result would be copied > to the buf provided by the caller and then caller would think the pread is > all set. Later the caller might use the buf to do something else (for e.g. > read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10182: Description: In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the passed-in buf from the caller is passed to another thread to fill. If the first attempt is timed out, the second attempt would be issued with another temp ByteBuffer. Now suppose the second attempt wins and the first attempt is blocked somewhere in the IO path. The second attempt's result would be copied to the buf provided by the caller and then caller would think the pread is all set. Later the caller might use the buf to do something else (for e.g. read another chunk of data), however, the first attempt in earlier hedgedFetchBlockByteRange might get some data and fill into the buf ... If this happens, the caller's buf would then be corrupted. To fix the issue, we should allocate a temp buf for the first attempt too. was: In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the passed-in buf from the caller is passed to another thread to fill. If the first attempt is timed out, the second attempt would be issued with another ByteBuffer. Now suppose the second attempt wins and the first attempt is blocked somewhere in the IO path. The second attempt's result would be copied to the buf provided by the caller and then caller would think the pread is all set. Later the caller might use the buf to do something else (for e.g. read another chunk of data), however, the first attempt in earlier hedgedFetchBlockByteRange might get some data and fill into the buf ... If this happens, the caller's buf would then be corrupted. To fix the issue, we should allocate a temp buf for the first attempt too. > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, during the first attempt, the > passed-in buf from the caller is passed to another thread to fill. If the > first attempt is timed out, the second attempt would be issued with another > temp ByteBuffer. Now suppose the second attempt wins and the first attempt > is blocked somewhere in the IO path. The second attempt's result would be > copied to the buf provided by the caller and then caller would think the > pread is all set. Later the caller might use the buf to do something else > (for e.g. read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10182) Hedged read might overwrite user's buf
zhouyingchao created HDFS-10182: --- Summary: Hedged read might overwrite user's buf Key: HDFS-10182 URL: https://issues.apache.org/jira/browse/HDFS-10182 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao In DFSInputStream::hedgedFetchBlockByteRange, the passed-in buf from the caller is passed to another thread to fill in the first attempt. If the first attempt is timed out, the second attempt would be issued with another ByteBuffer. Now suppose the second attempt wins and the first attempt is blocked somewhere in the IO path. The second attempt's result would be copied to the buf provided by the caller and then caller would think the pread is all set. Later the caller might use the buf to do something else (for e.g. read another chunk of data), however, the first attempt in earlier hedgedFetchBlockByteRange might get some data and fill into the buf ... If this happens, the caller's buf would then be corrupted. To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10182) Hedged read might overwrite user's buf
[ https://issues.apache.org/jira/browse/HDFS-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-10182: Attachment: HDFS-10182-001.patch > Hedged read might overwrite user's buf > -- > > Key: HDFS-10182 > URL: https://issues.apache.org/jira/browse/HDFS-10182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: zhouyingchao >Assignee: zhouyingchao > Attachments: HDFS-10182-001.patch > > > In DFSInputStream::hedgedFetchBlockByteRange, the passed-in buf from the > caller is passed to another thread to fill in the first attempt. If the > first attempt is timed out, the second attempt would be issued with another > ByteBuffer. Now suppose the second attempt wins and the first attempt is > blocked somewhere in the IO path. The second attempt's result would be copied > to the buf provided by the caller and then caller would think the pread is > all set. Later the caller might use the buf to do something else (for e.g. > read another chunk of data), however, the first attempt in earlier > hedgedFetchBlockByteRange might get some data and fill into the buf ... > If this happens, the caller's buf would then be corrupted. > To fix the issue, we should allocate a temp buf for the first attempt too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8649) Default ACL is not inherited if directory is generated by FileSystem.create interface
zhouyingchao created HDFS-8649: -- Summary: Default ACL is not inherited if directory is generated by FileSystem.create interface Key: HDFS-8649 URL: https://issues.apache.org/jira/browse/HDFS-8649 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao I have a directory /acltest/t, whose acl is as following: {code} # file: /acltest/t # owner: hdfs_tst_admin # group: supergroup user::rwx group::rwx mask::rwx other::--- default:user::rwx default:group::rwx default:mask::rwx default:other::rwx {code} My program create a file /acltest/t/a/b using the FileSystem.create interface. The acl of directory /acltest/t/a is as following: {code} # file: /acltest/t/a # owner: hdfs_tst_admin # group: supergroup user::rwx group::rwx mask::rwx other::--- default:user::rwx default:group::rwx default:mask::rwx default:other::rwx {code} As you can see, the child directory b did not inherit its parent's default acl for other. By looking into the implementation, the FileSystem.create interface will automatically create non-existing entries in the path, it is done by calling FSNamesystem.mkdirsRecursively and hard-coded the third param (inheritPermission) as true. In FSNamesystem.mkdirsRecursively, when inheritPermission is true, the parent's real permission (rather than calculation from default acl) would be used as the new directory's permission. Is this behavior correct? The default acl is not worked as people expected. It kind of render many access issues in our setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8649) Default ACL is not inherited if directory is generated by FileSystem.create interface
[ https://issues.apache.org/jira/browse/HDFS-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598752#comment-14598752 ] zhouyingchao commented on HDFS-8649: [~cnauroth] Any comments ? Default ACL is not inherited if directory is generated by FileSystem.create interface - Key: HDFS-8649 URL: https://issues.apache.org/jira/browse/HDFS-8649 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao I have a directory /acltest/t, whose acl is as following: {code} # file: /acltest/t # owner: hdfs_tst_admin # group: supergroup user::rwx group::rwx mask::rwx other::--- default:user::rwx default:group::rwx default:mask::rwx default:other::rwx {code} My program create a file /acltest/t/a/b using the FileSystem.create interface. The acl of directory /acltest/t/a is as following: {code} # file: /acltest/t/a # owner: hdfs_tst_admin # group: supergroup user::rwx group::rwx mask::rwx other::--- default:user::rwx default:group::rwx default:mask::rwx default:other::rwx {code} As you can see, the child directory b did not inherit its parent's default acl for other. By looking into the implementation, the FileSystem.create interface will automatically create non-existing entries in the path, it is done by calling FSNamesystem.mkdirsRecursively and hard-coded the third param (inheritPermission) as true. In FSNamesystem.mkdirsRecursively, when inheritPermission is true, the parent's real permission (rather than calculation from default acl) would be used as the new directory's permission. Is this behavior correct? The default acl is not worked as people expected. It kind of render many access issues in our setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598757#comment-14598757 ] zhouyingchao commented on HDFS-8496: [~cmccabe], Any comments? Calling stopWriter() with FSDatasetImpl lock held may block other threads -- Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8496-001.patch On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) - locked 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} In this case, we deployed quite a lot other workloads on the DN, the local file system and disk is quite busy. We guess this is why the stopWriter took quite a long time. Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held. In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw(). I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8496: --- Status: Patch Available (was: Open) Run all hdfs unit tests without introducing new failure. Calling stopWriter() with FSDatasetImpl lock held may block other threads -- Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) - locked 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} In this case, we deployed quite a lot other workloads on the DN, the local file system and disk is quite busy. We guess this is why the stopWriter took quite a long time. Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held. In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw(). I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8496: --- Attachment: HDFS-8496-001.patch Calling stopWriter() with FSDatasetImpl lock held may block other threads -- Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8496-001.patch On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) - locked 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} In this case, we deployed quite a lot other workloads on the DN, the local file system and disk is quite busy. We guess this is why the stopWriter took quite a long time. Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held. In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw(). I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
zhouyingchao created HDFS-8496: -- Summary: Calling stopWriter() with FSDatasetImpl lock held may block other threads Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) - locked 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} In this case, we deployed quite a lot other workloads on the DN, the local file system and disk is quite busy. We guess this is why the stopWriter took quite a long time. Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held. In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw(). I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) The DomainSocketWatcher thread should not block other threads if it dies
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560303#comment-14560303 ] zhouyingchao commented on HDFS-8429: Colin, thank you for pointing out this issue. I've changed and uploaded the patch accordingly. The DomainSocketWatcher thread should not block other threads if it dies Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, HDFS-8429-003.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8429) The DomainSocketWatcher thread should not block other threads if it dies
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8429: --- Attachment: HDFS-8429-003.patch Tested cases include TestParallelShortCircuitLegacyRead, TestParallelShortCircuitRead, TestParallelShortCircuitReadNoChecksum, TestParallelShortCircuitReadUnCached, TestShortCircuitCache, TestShortCircuitLocalRead, TestShortCircuitShm, TemporarySocketDirectory, TestDomainSocket, TestDomainSocketWatcher The DomainSocketWatcher thread should not block other threads if it dies Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, HDFS-8429-003.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8429: --- Attachment: HDFS-8429-002.patch Tested cases include TestParallelShortCircuitLegacyRead, TestParallelShortCircuitRead, TestParallelShortCircuitReadNoChecksum, TestParallelShortCircuitReadUnCached, TestShortCircuitCache, TestShortCircuitLocalRead, TestShortCircuitShm, TemporarySocketDirectory, TestDomainSocket, TestDomainSocketWatcher Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553702#comment-14553702 ] zhouyingchao commented on HDFS-8429: Yes, the modification of DomainSocketWatcher#add and DomainSocketWatcher#remove is not needed. I changed the patch accordingly and added a unit test case as suggested. The code of the test case is almost borrowed from the testStress() in the same file. Thank you. Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8429-001.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552095#comment-14552095 ] zhouyingchao commented on HDFS-8429: Colin, thank you for the great comments. In this case, I think the bottom line is that the death of the watcher thread should not block other threads and the client side should be indicated to fall through to other ways as quick as possible. I created a patch trying to resolve the blocking. Besides that, I also changed the native getAndClearReadableFds method to throw exception as Colin mentioned. Please feel free to post your thoughts and comments. Thank you. Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8429: --- Status: Patch Available (was: Open) Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8429-001.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8429: --- Attachment: HDFS-8429-001.patch Test following cases : TestDomainSocket,TestDomainSocketWatcher,TestParallelShortCircuitRead,TestFsDatasetCacheRevocation,TestFatasetCacheRevocation,TestScrLazyPersistFiles,TestParallelShortCircuitReadNoChecksum,TestDFSInputStream,TestBlockReaderFactory,TestParallelUnixDomainRead,TestParallelShortCircuitReadUnCached,TestBlockReaderLocalLegacy,TestPeerCache,TestShortCircuitCache,TestShortCircuitLocalRead,TestBlockReaderLocal,TestParallelShortCircuitLegacyRead,TestTracingShortCircuitLocalRead,TestEnhancedByteBufferAccess Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8429-001.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8429) Death of watcherThread making other local read blocked
zhouyingchao created HDFS-8429: -- Summary: Death of watcherThread making other local read blocked Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Death of watcherThread making other local read blocked
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550293#comment-14550293 ] zhouyingchao commented on HDFS-8429: [~cmccabe] Should we stop DN in this condition? Death of watcherThread making other local read blocked -- Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8419) chmod impact user's effective ACL
zhouyingchao created HDFS-8419: -- Summary: chmod impact user's effective ACL Key: HDFS-8419 URL: https://issues.apache.org/jira/browse/HDFS-8419 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx#effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8419) chmod impact user's effective ACL
[ https://issues.apache.org/jira/browse/HDFS-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547909#comment-14547909 ] zhouyingchao commented on HDFS-8419: @Chris Nauroth chmod impact user's effective ACL - Key: HDFS-8419 URL: https://issues.apache.org/jira/browse/HDFS-8419 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8419) chmod impact user's effective ACL
[ https://issues.apache.org/jira/browse/HDFS-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8419: --- Description: I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. was: I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx#effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. chmod impact user's effective ACL - Key: HDFS-8419 URL: https://issues.apache.org/jira/browse/HDFS-8419 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8419) chmod impact user's effective ACL
[ https://issues.apache.org/jira/browse/HDFS-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8419: --- Description: I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. was: I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - # file: /grptest # owner: hdfs_tst_admin # group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. chmod impact user's effective ACL - Key: HDFS-8419 URL: https://issues.apache.org/jira/browse/HDFS-8419 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8419) chmod impact user's effective ACL
[ https://issues.apache.org/jira/browse/HDFS-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8419: --- Description: I set a directory's ACL to assign rwx permission to user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of an acl enabled file would only change the permission mask. The abnormal thing is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. was: I set a directory's ACL to assign rwx permission to a user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of a acl enabled file would only change the permission mask. What's make me surprise is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. chmod impact user's effective ACL - Key: HDFS-8419 URL: https://issues.apache.org/jira/browse/HDFS-8419 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I set a directory's ACL to assign rwx permission to user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of an acl enabled file would only change the permission mask. The abnormal thing is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8419) chmod impact user's effective ACL
[ https://issues.apache.org/jira/browse/HDFS-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8419: --- Description: I set a directory's ACL to assign rwx permission to user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of an acl enabled file would only change the permission mask. The abnormal thing is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx#effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. was: I set a directory's ACL to assign rwx permission to user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of an acl enabled file would only change the permission mask. The abnormal thing is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:hdfs_admin:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. chmod impact user's effective ACL - Key: HDFS-8419 URL: https://issues.apache.org/jira/browse/HDFS-8419 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I set a directory's ACL to assign rwx permission to user h_user1. Later, I used chmod to change the group permission to r-x. I understand chmod of an acl enabled file would only change the permission mask. The abnormal thing is that the operation will change the h_user1's effective ACL from rwx to r-x. Following are ACLs before any operaton: - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx group::r-x mask::rwx other::--- - Following are ACLs after chmod 750 /grptest - \# file: /grptest \# owner: hdfs_tst_admin \# group: supergroup user::rwx user:h_user1:rwx #effective:r-x group::r-x mask::r-x other::--- - I'm wondering if this behavior is by design. If not, I'd like to fix the issue. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7897) Shutdown metrics when stopping JournalNode
[ https://issues.apache.org/jira/browse/HDFS-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521202#comment-14521202 ] zhouyingchao commented on HDFS-7897: Any updates regarding this simple patch? Shutdown metrics when stopping JournalNode -- Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7897-001.patch In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7999: --- Attachment: HDFS-7999-003.patch FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch, HDFS-7999-002.patch, HDFS-7999-003.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395791#comment-14395791 ] zhouyingchao commented on HDFS-7999: Thank you, Colin. I'll update the patch soon. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch, HDFS-7999-002.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392426#comment-14392426 ] zhouyingchao commented on HDFS-7999: Thanks a lot for the comments, Colin. I would update the patch accordingly. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7999: --- Attachment: HDFS-7999-002.patch Test with -Dtest=FsDatasetTestUtil,LazyPersistTestCase,TestDatanodeRestart,TestFsDatasetImpl,TestFsVolumeList,TestInterDatanodeProtocol,TestLazyPersistFiles,TestRbwSpaceReservation,TestReplicaMap,TestScrLazyPersistFiles,TestWriteToReplica,TestBalancer,TestDatanodeManager FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch, HDFS-7999-002.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
[ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8045: --- Attachment: HDFS-8045-001.patch Test with -Dtest=TestSimulatedFSDataset,TestNamenodeCapacityReport,TestFileCreation,TestDecommission Incorrect calculation of NonDfsUsed and Remaining - Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - RealNonDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved RealNonDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = RealNonDfsUsed - Reserved; {code} Either way it is far from the correct value. After investigating the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any sense. I'll post a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
[ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8045: --- Description: After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved NDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = NDfsUsed - Reserved; {code} Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any way. I'll post a patch soon. was: After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved NDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = NDfsUsed - Reserved; {code} Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any way. I'll post a patch soon. Incorrect calculation of NonDfsUsed and Remaining - Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved NDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = NDfsUsed - Reserved; {code} Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable
[jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
[ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8045: --- Description: After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved NDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = NDfsUsed - Reserved; {code} Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any way. I'll post a patch soon. was: After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed }. Later on, Namenode will calculate NonDfsUsed of the volume as Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed}. Given the calculation, finally we will have - if Reserved + RbwReserved NDfsUsed, then the calculated NonDfsUsed will be RbwReserved. Otherwise if Reserved + RbwReserved NDfsUsed, then the calculated NonDfsUsed would be NDfsUsed - Reserved. Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any way. I'll post a patch soon. Incorrect calculation of NonDfsUsed and Remaining - Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved NDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = NDfsUsed - Reserved; {code} Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and
[jira] [Created] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
zhouyingchao created HDFS-8045: -- Summary: Incorrect calculation of NonDfsUsed and Remaining Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we actually write some data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed }. Later on, Namenode will calculate NonDfsUsed of the volume as Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed}. Given the calculation, finally we will have - if Reserved + RbwReserved NDfsUsed, then the calculated NonDfsUsed will be RbwReserved. Otherwise if Reserved + RbwReserved NDfsUsed, then the calculated NonDfsUsed would be NDfsUsed - Reserved. Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any way. I'll post a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
[ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8045: --- Component/s: datanode Incorrect calculation of NonDfsUsed and Remaining - Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - RealNonDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved RealNonDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = RealNonDfsUsed - Reserved; {code} Either way it is far from the correct value. After investigating the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any sense. I'll post a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
[ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8045: --- Affects Version/s: 2.6.0 Status: Patch Available (was: Open) Incorrect calculation of NonDfsUsed and Remaining - Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - RealNonDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved RealNonDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = RealNonDfsUsed - Reserved; {code} Either way it is far from the correct value. After investigating the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any sense. I'll post a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8045) Incorrect calculation of NonDfsUsed and Remaining
[ https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8045: --- Description: After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - RealNonDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved RealNonDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = RealNonDfsUsed - Reserved; {code} Either way it is far from the correct value. After investigating the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any sense. I'll post a patch soon. was: After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, NDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - NDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved NDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = NDfsUsed - Reserved; {code} Either way it is far from a correct value. After investigation the implementation, we believe the Reserved and RbwReserved should be subtract from available in getAvailable since they are actually not available to hdfs in any way. I'll post a patch soon. Incorrect calculation of NonDfsUsed and Remaining - Key: HDFS-8045 URL: https://issues.apache.org/jira/browse/HDFS-8045 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8045-001.patch After reserve some space via the param dfs.datanode.du.reserved, we noticed that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write some non-hdfs data to the volume. After some investigation, we think there is an issue in the calculation of FsVolumeImpl.getAvailable - following is the explaination. For a volume, let's use Raw to represent raw capacity, DfsUsed to represent space consumed by hdfs blocks, Reserved to represent reservation through dfs.datanode.du.reserved, RbwReserved to represent space reservation for rbw blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will include non-hdfs files and meta data consumed by local filesystem). In current implementation, for a volume, available space will be actually calculated as {code} min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed } {code} Later on, Namenode will calculate NonDfsUsed of the volume as {code} Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - RealNonDfsUsed} {code} Given the calculation, finally we will have - {code} if (Reserved + RbwReserved RealNonDfsUsed) NonDfsUsed = RbwReserved; else NonDfsUsed = RealNonDfsUsed - Reserved; {code} Either way it is far from the correct value. After investigating the implementation, we believe the Reserved and
[jira] [Commented] (HDFS-5215) dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space
[ https://issues.apache.org/jira/browse/HDFS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392609#comment-14392609 ] zhouyingchao commented on HDFS-5215: Shouldn't we also subtract rbwReserved ? dfs.datanode.du.reserved is not taking effect as it's not considered while getting the available space -- Key: HDFS-5215 URL: https://issues.apache.org/jira/browse/HDFS-5215 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-5215-002.patch, HDFS-5215-003.patch, HDFS-5215.patch {code}public long getAvailable() throws IOException { long remaining = getCapacity()-getDfsUsed(); long available = usage.getAvailable(); if (remaining available) { remaining = available; } return (remaining 0) ? remaining : 0; } {code} Here we are not considering the reserved space while getting the Available Space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388080#comment-14388080 ] zhouyingchao commented on HDFS-7999: Thank you for looking into the patch. Here is some explain of the logic of createTemporary() after applying the patch: 1. If there is no ReplicaInfo in volumeMap for the passed in ExtendedBlock b, then we will create one, insert into volumeMap and then return from line 1443. 2. If there is a ReplicaInfo in volumeMap and its GS is newer than the passed in ExtendedBlock b, then throw the ReplicaAlreadyExistsException from line 1447. 3. If there is a ReplicaInfo in volumeMap whereas its GS is older than the passed in ExtendedBlock b, then it means this is a new write and the earlier writer should be stopped. We will release the FsDatasetImpl lock and try to stop the earlier writer w/o the lock. 4. After the earlier writer is stopped, we need to evict earlier writer's ReplicaInfo from volumeMap, to that end we will re-acquire the FsDatasetImpl lock. However, since this thread has released the FsDatasetImpl lock when it tried to stop earlier writer, another thread might have come in and changed the ReplicaInfo of this block in VolumeMap. This situation is not very likely to happen whereas we have to handle it in case. The loop in the patch is just tried to handle this situation -- after re-acuire the FsDatasetImpl lock, it will check if the current ReplicaInfo in volumeMap is still the one before we stop the writer, if so we can simply evict it and create/insert a new one then return from line 1443. Otherwise, it implies another thread has slipped in and changed the ReplicaInfo when we were stopping earlier writer. In this condition, we check if that thread has inserted a block with even newer GS than us, if so we throws ReplicaAlreadyExistsException from line 1447. Otherwise we need to stop that thread's write just like we stop the earlier writer in step 3. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at
[jira] [Commented] (HDFS-7999) DN Hearbeat is blocked by waiting FsDatasetImpl lock
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386528#comment-14386528 ] zhouyingchao commented on HDFS-7999: Hi Xinwei Thank you for sharing the status regarding HDFS-7060. I think it is the right way to fix the heartbeat issue. Saying that, I still think the patch here is necessary - current implementation of createTemporary() might sleep up to 60s with a lock held, it does not make sense, right? It might block other threads besides heartbeat for a long time any way. Comments? Thoughts? DN Hearbeat is blocked by waiting FsDatasetImpl lock Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) DN Hearbeat is blocked by waiting FsDatasetImpl lock
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386536#comment-14386536 ] zhouyingchao commented on HDFS-7999: I tested TestBalancer with the patch on my rig, it can pass. From the log of the failure, it looks like not related to the patch. DN Hearbeat is blocked by waiting FsDatasetImpl lock Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7997) The first non-existing xattr should also throw IOException
[ https://issues.apache.org/jira/browse/HDFS-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386205#comment-14386205 ] zhouyingchao commented on HDFS-7997: Thank you for pointing it out. Actually we are using the name of user.xxx, the pseudo code snippet here is just used to explain the issue. The first non-existing xattr should also throw IOException -- Key: HDFS-7997 URL: https://issues.apache.org/jira/browse/HDFS-7997 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Priority: Minor Attachments: HDFS-7997-001.patch We use the following code snippet to get/set xattrs. However, if there are no xattrs have ever been set, the first getXAttr returns null and the second one just throws exception with message like At least one of the attributes provided was not found.. This is not expected, we believe they should behave in the same way - i.e either both getXAttr returns null or both getXAttr throw exception with the message ... not found. We will provide a patch to make them both throw exception. attrValueNM = fs.getXAttr(path, nm); if (attrValueNM == null) { fs.setXAttr(nm, DEFAULT_VALUE); } attrValueNN = fs.getXAttr(path, nn); if (attrValueNN == null) { fs.setXAttr(nn, DEFAULT_VALUE); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7997) The first non-existing xattr should also throw IOException
[ https://issues.apache.org/jira/browse/HDFS-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7997: --- Affects Version/s: 2.6.0 Status: Patch Available (was: Open) Test with -Dtest=TestXAttrCLI,TestXAttrWithSnapshot,TestNameNodeXAttr,TestFileContextXAttr,FSXAttrBaseTest,TestFSImageWithXAttr,TestXAttrConfigFlag,TestXAttrsWithHA,TestWebHDFSXAttr,TestXAttr,TestViewFileSystemWithXAttrs,TestViewFsWithXAttrs The first non-existing xattr should also throw IOException -- Key: HDFS-7997 URL: https://issues.apache.org/jira/browse/HDFS-7997 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao We use the following code snippet to get/set xattrs. However, if there are no xattrs have ever been set, the first getXAttr returns null and the second one just throws exception with message like At least one of the attributes provided was not found.. This is not expected, we believe they should behave in the same way - i.e either both getXAttr returns null or both getXAttr throw exception with the message ... not found. We will provide a patch to make them both throw exception. attrValueNM = fs.getXAttr(path, nm); if (attrValueNM == null) { fs.setXAttr(nm, DEFAULT_VALUE); } attrValueNN = fs.getXAttr(path, nn); if (attrValueNN == null) { fs.setXAttr(nn, DEFAULT_VALUE); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7212) Huge number of BLOCKED threads rendering DataNodes useless
[ https://issues.apache.org/jira/browse/HDFS-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383655#comment-14383655 ] zhouyingchao commented on HDFS-7212: I'll create a JIRA for the issue I met and submit a patch. Huge number of BLOCKED threads rendering DataNodes useless -- Key: HDFS-7212 URL: https://issues.apache.org/jira/browse/HDFS-7212 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Environment: PROD Reporter: Istvan Szukacs There are 3000 - 8000 threads in each datanode JVM, blocking the entire VM and rendering the service unusable, missing heartbeats and stopping data access. The threads look like this: {code} 3415 (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Interpreted frame) - java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() @bci=21, line=214 (Compiled frame) - java.util.concurrent.locks.ReentrantLock.lock() @bci=4, line=290 (Compiled frame) - org.apache.hadoop.net.unix.DomainSocketWatcher.add(org.apache.hadoop.net.unix.DomainSocket, org.apache.hadoop.net.unix.DomainSocketWatcher$Handler) @bci=4, line=286 (Interpreted frame) - org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(java.lang.String, org.apache.hadoop.net.unix.DomainSocket) @bci=169, line=283 (Interpreted frame) - org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(java.lang.String) @bci=212, line=413 (Interpreted frame) - org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(java.io.DataInputStream) @bci=13, line=172 (Interpreted frame) - org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(org.apache.hadoop.hdfs.protocol.datatransfer.Op) @bci=149, line=92 (Compiled frame) - org.apache.hadoop.hdfs.server.datanode.DataXceiver.run() @bci=510, line=232 (Compiled frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) {code} Has anybody seen this before? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7999) DN Hearbeat is blocked by waiting FsDatasetImpl lock
zhouyingchao created HDFS-7999: -- Summary: DN Hearbeat is blocked by waiting FsDatasetImpl lock Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7999) DN Hearbeat is blocked by waiting FsDatasetImpl lock
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao reassigned HDFS-7999: -- Assignee: zhouyingchao DN Hearbeat is blocked by waiting FsDatasetImpl lock Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7212) Huge number of BLOCKED threads rendering DataNodes useless
[ https://issues.apache.org/jira/browse/HDFS-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383654#comment-14383654 ] zhouyingchao commented on HDFS-7212: I'll create a JIRA for the issue I met and submit a patch. Huge number of BLOCKED threads rendering DataNodes useless -- Key: HDFS-7212 URL: https://issues.apache.org/jira/browse/HDFS-7212 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Environment: PROD Reporter: Istvan Szukacs There are 3000 - 8000 threads in each datanode JVM, blocking the entire VM and rendering the service unusable, missing heartbeats and stopping data access. The threads look like this: {code} 3415 (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Compiled frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) @bci=67, line=867 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, line=1197 (Interpreted frame) - java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() @bci=21, line=214 (Compiled frame) - java.util.concurrent.locks.ReentrantLock.lock() @bci=4, line=290 (Compiled frame) - org.apache.hadoop.net.unix.DomainSocketWatcher.add(org.apache.hadoop.net.unix.DomainSocket, org.apache.hadoop.net.unix.DomainSocketWatcher$Handler) @bci=4, line=286 (Interpreted frame) - org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(java.lang.String, org.apache.hadoop.net.unix.DomainSocket) @bci=169, line=283 (Interpreted frame) - org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(java.lang.String) @bci=212, line=413 (Interpreted frame) - org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(java.io.DataInputStream) @bci=13, line=172 (Interpreted frame) - org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(org.apache.hadoop.hdfs.protocol.datatransfer.Op) @bci=149, line=92 (Compiled frame) - org.apache.hadoop.hdfs.server.datanode.DataXceiver.run() @bci=510, line=232 (Compiled frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) {code} Has anybody seen this before? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7999) DN Hearbeat is blocked by waiting FsDatasetImpl lock
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7999: --- Attachment: HDFS-7999-001.patch Test with -Dtest=FsDatasetTestUtil,LazyPersistTestCase,TestDatanodeRestart,TestFsDatasetImpl,TestFsVolumeList,TestInterDatanodeProtocol,TestLazyPersistFiles,TestRbwSpaceReservation,TestReplicaMap,TestScrLazyPersistFiles,TestWriteToReplica DN Hearbeat is blocked by waiting FsDatasetImpl lock Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7999) DN Hearbeat is blocked by waiting FsDatasetImpl lock
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7999: --- Status: Patch Available (was: Open) The fix is to call stopWriter w/o the FsDatasetImpl lock. However, without lock, another thread may slip in and inject another ReplicaInfo to the map when we stop the writter. To resolve the issue, we will try to invalidate stale replica in a loop. As the last resort, if we hang in the thread too long, we will bail out the loop with an IOException. DN Hearbeat is blocked by waiting FsDatasetImpl lock Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7997) The first non-existing xattr should also throw IOException
zhouyingchao created HDFS-7997: -- Summary: The first non-existing xattr should also throw IOException Key: HDFS-7997 URL: https://issues.apache.org/jira/browse/HDFS-7997 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao We use the following code snippet to get/set xattrs. However, if there are no xattrs have ever been set, the first getXAttr returns null and the second one just throws exception with message like At least one of the attributes provided was not found.. This is not expected, we believe they should behave in the same way - i.e either both getXAttr returns null or both getXAttr throw exception with the message ... not found. We will provide a patch to make them both throw exception. attrValueNM = fs.getXAttr(path, nm); if (attrValueNM == null) { fs.setXAttr(nm, DEFAULT_VALUE); } attrValueNN = fs.getXAttr(path, nn); if (attrValueNN == null) { fs.setXAttr(nn, DEFAULT_VALUE); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7997) The first non-existing xattr should also throw IOException
[ https://issues.apache.org/jira/browse/HDFS-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7997: --- Attachment: HDFS-7997-001.patch The first non-existing xattr should also throw IOException -- Key: HDFS-7997 URL: https://issues.apache.org/jira/browse/HDFS-7997 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7997-001.patch We use the following code snippet to get/set xattrs. However, if there are no xattrs have ever been set, the first getXAttr returns null and the second one just throws exception with message like At least one of the attributes provided was not found.. This is not expected, we believe they should behave in the same way - i.e either both getXAttr returns null or both getXAttr throw exception with the message ... not found. We will provide a patch to make them both throw exception. attrValueNM = fs.getXAttr(path, nm); if (attrValueNM == null) { fs.setXAttr(nm, DEFAULT_VALUE); } attrValueNN = fs.getXAttr(path, nn); if (attrValueNN == null) { fs.setXAttr(nn, DEFAULT_VALUE); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7997) The first non-existing xattr should also throw IOException
[ https://issues.apache.org/jira/browse/HDFS-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383708#comment-14383708 ] zhouyingchao commented on HDFS-7997: The failed test case is not related to the change. I just verified that it cannot pass without the xattr changes. The first non-existing xattr should also throw IOException -- Key: HDFS-7997 URL: https://issues.apache.org/jira/browse/HDFS-7997 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7997-001.patch We use the following code snippet to get/set xattrs. However, if there are no xattrs have ever been set, the first getXAttr returns null and the second one just throws exception with message like At least one of the attributes provided was not found.. This is not expected, we believe they should behave in the same way - i.e either both getXAttr returns null or both getXAttr throw exception with the message ... not found. We will provide a patch to make them both throw exception. attrValueNM = fs.getXAttr(path, nm); if (attrValueNM == null) { fs.setXAttr(nm, DEFAULT_VALUE); } attrValueNN = fs.getXAttr(path, nn); if (attrValueNN == null) { fs.setXAttr(nn, DEFAULT_VALUE); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349744#comment-14349744 ] zhouyingchao commented on HDFS-7868: Looks like there are some issues with the jekins system? I'll resubmit the patch to kick the build again. Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch, HDFS-7868-002.patch, HDFS-7868-003.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Status: Open (was: Patch Available) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: (was: HDFS-7868-004.patch) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: (was: HDFS-7868-003.patch) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: (was: HDFS-7868-002.patch) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: (was: HDFS-7868-001.patch) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7897) Shutdown metrics when stopping JournalNode
zhouyingchao created HDFS-7897: -- Summary: Shutdown metrics when stopping JournalNode Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: HDFS-7868-004.patch Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch, HDFS-7868-002.patch, HDFS-7868-003.patch, HDFS-7868-004.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: HDFS-7868-002.patch The new patch which can pass earlier failed cases on my computer. Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch, HDFS-7868-002.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: HDFS-7868-003.patch Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch, HDFS-7868-002.patch, HDFS-7868-003.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7897) Shutdown metrics when stopping JournalNode
[ https://issues.apache.org/jira/browse/HDFS-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350003#comment-14350003 ] zhouyingchao commented on HDFS-7897: I tried the patch on my computer with the two failed case reported by robot, both of them finish successfully. The failure should not be related with the patch. Shutdown metrics when stopping JournalNode -- Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7897-001.patch In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: HDFS-7868-001.patch Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Status: Patch Available (was: Open) Not sure what's wrong with the Jekins build system. I just cancel and deleted all attachment and retry to submit the patch. Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7897) Shutdown metrics when stopping JournalNode
[ https://issues.apache.org/jira/browse/HDFS-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7897: --- Affects Version/s: 2.6.0 Status: Patch Available (was: Open) Shutdown metrics when stopping JournalNode -- Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7897) Shutdown metrics when stopping JournalNode
[ https://issues.apache.org/jira/browse/HDFS-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7897: --- Attachment: HDFS-7897-001.patch Shutdown metrics when stopping JournalNode -- Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7897-001.patch In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7897) Shutdown metrics when stopping JournalNode
[ https://issues.apache.org/jira/browse/HDFS-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao reassigned HDFS-7897: -- Assignee: zhouyingchao Shutdown metrics when stopping JournalNode -- Key: HDFS-7897 URL: https://issues.apache.org/jira/browse/HDFS-7897 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao In JournalNode.stop(), the metrics system is forgotten to shutdown. The issue is found when reading the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344208#comment-14344208 ] zhouyingchao commented on HDFS-7868: I'll investigate the failure a few days after finish some other tasks. Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Attachment: HDFS-7868-001.patch Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7868) Use proper blocksize to choose target for blocks
[ https://issues.apache.org/jira/browse/HDFS-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-7868: --- Affects Version/s: 2.6.0 Status: Patch Available (was: Open) Use proper blocksize to choose target for blocks Key: HDFS-7868 URL: https://issues.apache.org/jira/browse/HDFS-7868 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7868-001.patch In BlockPlacementPolicyDefault.java:isGoodTarget, the passed-in blockSize is used to determine if there is enough room for a new block on a data node. However, in two conditions the blockSize might not be proper for the purpose: (a) the passed in block size is just the size of the last block of a file, which might be very small (for e.g., called from BlockManager.ReplicationWork.chooseTargets). (b) A file which might be created with a smaller blocksize. In these conditions, the calculated scheduledSize might be smaller than the actual value, which finally might lead to following failure of writing or replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)