[jira] Created: (HDFS-1239) All datanodes are bad in 2nd phase
All datanodes are bad in 2nd phase -- Key: HDFS-1239 URL: https://issues.apache.org/jira/browse/HDFS-1239 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setups: number of datanodes = 2 replication factor = 2 Type of failure: transient fault (a java i/o call throws an exception or return false) Number of failures = 2 when/where failures happen = during the 2nd phase of the pipeline, each happens at each datanode when trying to perform I/O (e.g. dataoutputstream.flush()) - Details: This is similar to HDFS-1237. In this case, node1 throws exception that makes client creates a pipeline only with node2, then tries to redo the whole thing, which throws another failure. So at this point, the client considers all datanodes are bad, and never retries the whole thing again, (i.e. it never asks the namenode again to ask for a new set of datanodes). In HDFS-1237, the bug is due to permanent disk fault. In this case, it's about transient error. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
[ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879778#action_12879778 ] dhruba borthakur commented on HDFS-1071: hi konstantin, it appears that Dmytro's last comment addresses all of your questions. savenamespace should write the fsimage to all configured fs.name.dir in parallel Key: HDFS-1071 URL: https://issues.apache.org/jira/browse/HDFS-1071 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Dmytro Molkov Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, HDFS-1071.patch If you have a large number of files in HDFS, the fsimage file is very big. When the namenode restarts, it writes a copy of the fsimage to all directories configured in fs.name.dir. This takes a long time, especially if there are many directories in fs.name.dir. Make the NN write the fsimage to all these directories in parallel. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-947) The namenode should redirect a hftp request to read a file to the datanode that has the maximum number of local replicas
[ https://issues.apache.org/jira/browse/HDFS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879779#action_12879779 ] dhruba borthakur commented on HDFS-947: --- +1 code looks good to me. The namenode should redirect a hftp request to read a file to the datanode that has the maximum number of local replicas Key: HDFS-947 URL: https://issues.apache.org/jira/browse/HDFS-947 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: Dmytro Molkov Attachments: HDFS-947.2.patch, HDFS-947.patch, hftpRedirection.patch A client that uses the Hftp protocol to read a file is redirected by the namenode to a random datanode. It would be nice if the client gets redirected to a datanode that has the maximum number of local replicas of the blocks of the file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests
[ https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879782#action_12879782 ] dhruba borthakur commented on HDFS-599: --- dmytro: can you pl run the Hudson tests manually and post the results here? Thanks. Improve Namenode robustness by prioritizing datanode heartbeats over client requests Key: HDFS-599 URL: https://issues.apache.org/jira/browse/HDFS-599 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Dmytro Molkov Fix For: 0.22.0 Attachments: HDFS-599.3.patch, HDFS-599.patch The namenode processes RPC requests from clients that are reading/writing to files as well as heartbeats/block reports from datanodes. Sometime, because of various reasons (Java GC runs, inconsistent performance of NFS filer that stores HDFS transacttion logs, etc), the namenode encounters transient slowness. For example, if the device that stores the HDFS transaction logs becomes sluggish, the Namenode's ability to process RPCs slows down to a certain extent. During this time, the RPCs from clients as well as the RPCs from datanodes suffer in similar fashion. If the underlying problem becomes worse, the NN's ability to process a heartbeat from a DN is severly impacted, thus causing the NN to declare that the DN is dead. Then the NN starts replicating blocks that used to reside on the now-declared-dead datanode. This adds extra load to the NN. Then the now-declared-datanode finally re-establishes contact with the NN, and sends a block report. The block report processing on the NN is another heavyweight activity, thus casing more load to the already overloaded namenode. My proposal is tha the NN should try its best to continue processing RPCs from datanodes and give lesser priority to serving client requests. The Datanode RPCs are integral to the consistency and performance of the Hadoop file system, and it is better to protect it at all costs. This will ensure that NN recovers from the hiccup much faster than what it does now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1239) All datanodes are bad in 2nd phase
[ https://issues.apache.org/jira/browse/HDFS-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879786#action_12879786 ] dhruba borthakur commented on HDFS-1239: if a client has written some data to a set of replicas for that block and then all the replicas go bad, then the client gets an IO error and stops writing any more data to that file. what is ur proposed fix? can you pl explain, thanks. All datanodes are bad in 2nd phase -- Key: HDFS-1239 URL: https://issues.apache.org/jira/browse/HDFS-1239 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setups: number of datanodes = 2 replication factor = 2 Type of failure: transient fault (a java i/o call throws an exception or return false) Number of failures = 2 when/where failures happen = during the 2nd phase of the pipeline, each happens at each datanode when trying to perform I/O (e.g. dataoutputstream.flush()) - Details: This is similar to HDFS-1237. In this case, node1 throws exception that makes client creates a pipeline only with node2, then tries to redo the whole thing, which throws another failure. So at this point, the client considers all datanodes are bad, and never retries the whole thing again, (i.e. it never asks the namenode again to ask for a new set of datanodes). In HDFS-1237, the bug is due to permanent disk fault. In this case, it's about transient error. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879836#action_12879836 ] Tsz Wo (Nicholas), SZE commented on HDFS-1114: -- What about -XX:+UseCompressedOops ? This is a good point. Is there a way to determine if UseCompressedOops is set in runtime? Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1114: - Attachment: h1114_20100617.patch h1114_20100617.patch: the UnsupportedOperationException thrown in put(..) should be NullPointerException. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1240) TestDFSShell failing in branch-20
[ https://issues.apache.org/jira/browse/HDFS-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1240: -- Attachment: hdfs-1240.txt Here's a patch that fixes the issue (I reran TestDFSShell and TestEditLogRace manually. will rerun the rest of the unit tests on my internal hudson in a minute) TestDFSShell failing in branch-20 - Key: HDFS-1240 URL: https://issues.apache.org/jira/browse/HDFS-1240 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.20-append, 0.20.3 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-1240.txt After the backport of HDFS-909 into branch 20, TestDFSShell fails since it relies on resetting the base dir for minicluster through a system property. The backport changed MiniDFSCluster to read the property from an initializer instead of from the constructor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1240) TestDFSShell failing in branch-20
TestDFSShell failing in branch-20 - Key: HDFS-1240 URL: https://issues.apache.org/jira/browse/HDFS-1240 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.20-append, 0.20.3 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hdfs-1240.txt After the backport of HDFS-909 into branch 20, TestDFSShell fails since it relies on resetting the base dir for minicluster through a system property. The backport changed MiniDFSCluster to read the property from an initializer instead of from the constructor. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1234) Datanode 'alive' but with its disk failed, Namenode thinks it's alive
[ https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879848#action_12879848 ] Allen Wittenauer commented on HDFS-1234: In 0.20.1, the datanode process should die on a failed read or write. Eventually the namenode will mark it as dead after lack of heartbeats. Are you actually testing trunk? Datanode 'alive' but with its disk failed, Namenode thinks it's alive - Key: HDFS-1234 URL: https://issues.apache.org/jira/browse/HDFS-1234 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive - Setups: + Replication = 1 + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + Failure type = bad disk + When/where failure happens = first phase of the pipeline - Details: In this experiment we have two datanodes. Each node has 1 disk. However, if one datanode has a failed disk (but the node is still alive), the datanode does not keep track of this. From the perspective of the namenode, that datanode is still alive, and thus the namenode gives back the same datanode to the client. The client will retry 3 times by asking the namenode to give a new set of datanodes, and always get the same datanode. And every time the client wants to write there, it gets an exception. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1234) Datanode 'alive' but with its disk failed, Namenode thinks it's alive
[ https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879849#action_12879849 ] Allen Wittenauer commented on HDFS-1234: See also HDFS-138 and HDFS-457. Datanode 'alive' but with its disk failed, Namenode thinks it's alive - Key: HDFS-1234 URL: https://issues.apache.org/jira/browse/HDFS-1234 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive - Setups: + Replication = 1 + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + Failure type = bad disk + When/where failure happens = first phase of the pipeline - Details: In this experiment we have two datanodes. Each node has 1 disk. However, if one datanode has a failed disk (but the node is still alive), the datanode does not keep track of this. From the perspective of the namenode, that datanode is still alive, and thus the namenode gives back the same datanode to the client. The client will retry 3 times by asking the namenode to give a new set of datanodes, and always get the same datanode. And every time the client wants to write there, it gets an exception. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1219) Data Loss due to edits log truncation
[ https://issues.apache.org/jira/browse/HDFS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879851#action_12879851 ] Allen Wittenauer commented on HDFS-1219: Then how would the world know how awesome their framework is? Data Loss due to edits log truncation - Key: HDFS-1219 URL: https://issues.apache.org/jira/browse/HDFS-1219 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Thanh Do We found this problem almost at the same time as HDFS developers. Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage. Hence, any crash happens after the truncation but before the renaming will lead to a data loss. Detailed description can be found here: https://issues.apache.org/jira/browse/HDFS-955 This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1233) Bad retry logic at DFSClient
[ https://issues.apache.org/jira/browse/HDFS-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1233. --- Resolution: Won't Fix This is a known deficiency, don't think anyone has plans to fix it. Any cluster that has multiple disks per DN likely has multiple DNs too. Bad retry logic at DFSClient Key: HDFS-1233 URL: https://issues.apache.org/jira/browse/HDFS-1233 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: failover bug, bad retry logic at DFSClient, cannot failover to the 2nd disk - Setups: + # available datanodes = 1 + # disks / datanode = 2 + # failures = 1 + failure type = bad disk + When/where failure happens = (see below) - Details: The setup is: 1 datanode, 1 replica, and each datanode has 2 disks (Disk1 and Disk2). We injected a single disk failure to see if we can failover to the second disk or not. If a persistent disk failure happens during createBlockOutputStream (the first phase of the pipeline creation) (e.g. say DN1-Disk1 is bad), then createBlockOutputStream (cbos) will get an exception and it will retry! When it retries it will get the same DN1 from the namenode, and then DN1 will call DN.writeBlock(), FSVolume.createTmpFile, and finally getNextVolume() which a moving volume#. Thus, on the second try, the write will be successfully go to the second disk. So essentially createBlockOutputStream is wrapped in a do/while(retry --count = 0). The first cbos will fail, the second will be successful in this particular scenario. NOW, say cbos is successful, but the failure is persistent. Then the retry is in a different while loop. First, hasError is set to true in RP.run (responder packet). Thus, DataStreamer.run() will go back to the loop: while(!closed clientRunning !lastPacketInBlock). Now this second iteration of the loop will call processDatanodeError because hasError has been set to true. In processDatanodeError (pde), the client sees that this is the only datanode in the pipeline, and hence it considers that the node is bad! Although actually only 1 disk is bad! Hence, pde throws IOException suggesting all the datanodes (in this case, only DN1) in the pipeline is bad. Hence, in this error, the exception is thrown to the client. But if the exception, say, is catched by the most outer while loop do-while(retry --count = 0), then this outer retry will be successful then (as suggested in the previous paragraph). In summary, if in a deployment scenario, we only have one datanode that has multiple disks, and one disk goes bad, then the current retry logic at the DFSClient side is not robust enough to mask the failure from the client. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1221) NameNode unable to start due to stale edits log after a crash
[ https://issues.apache.org/jira/browse/HDFS-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879857#action_12879857 ] Allen Wittenauer commented on HDFS-1221: Shouldn't having multiple namedirs defined (i.e., following best practices) make this failure case high improbable? NameNode unable to start due to stale edits log after a crash - Key: HDFS-1221 URL: https://issues.apache.org/jira/browse/HDFS-1221 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: If a crash happens during FSEditLog.createEditLogFile(), the edits log file on disk may be stale. During next reboot, NameNode will get an exception when parsing the edits file, because of stale data, leading to unsuccessful reboot. Note: This is just one example. Since we see that edits log (and fsimage) does not have checksum, they are vulnerable to corruption too. - Details: The steps to create new edits log (which we infer from HDFS code) are: 1) truncate the file to zero size 2) write FSConstants.LAYOUT_VERSION to buffer 3) insert the end-of-file marker OP_INVALID to the end of the buffer 4) preallocate 1MB of data, and fill the data with 0 5) flush the buffer to disk Note that only in step 1, 4, 5, the data on disk is actually changed. Now, suppose a crash happens after step 4, but before step 5. In the next reboot, NameNode will fetch this edits log file (which contains all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK, because NameNode has code to handle that case. (but we expect LAYOUT_VERSION to be -18, don't we). Now it parses the operation code, which happens to be 0. Unfortunately, since 0 is the value for OP_ADD, the NameNode expects some parameters corresponding to that operation. Now NameNode calls readString to read the path, which throws an exception leading to a failed reboot. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1232) Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut
[ https://issues.apache.org/jira/browse/HDFS-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1232. --- Resolution: Duplicate This has already been discussed elsewhere. The primary assumption is that a pipeline has more than one DN in it, and this is unlikely to happen on all of the DNs simultaneously. So one replica will get corrupt, but we have others that are fine. Corrupted block if a crash happens before writing to checksumOut but after writing to dataOut - Key: HDFS-1232 URL: https://issues.apache.org/jira/browse/HDFS-1232 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: block is corrupted if a crash happens before writing to checksumOut but after writing to dataOut. - Setup: + # available datanodes = 1 + # disks / datanode = 1 + # failures = 1 + failure type = crash +When/where failure happens = (see below) - Details: The order of processing a packet during client write/append at datanode is first forward the packet to downstream, then write to data the block file, and and finally, write to the checksum file. Hence if a crash happens BEFORE the write to checksum file but AFTER the write to data file, the block is corrupted. Worse, if this is the only available replica, the block is lost. We also found this problem in case there are 3 replicas for a particular block, and during append, there are two failures. (see HDFS-1231) This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1234) Datanode 'alive' but with its disk failed, Namenode thinks it's alive
[ https://issues.apache.org/jira/browse/HDFS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1234. --- Resolution: Duplicate Resolved by HDFS-630 Datanode 'alive' but with its disk failed, Namenode thinks it's alive - Key: HDFS-1234 URL: https://issues.apache.org/jira/browse/HDFS-1234 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: Datanode 'alive' but with its disk failed, Namenode still thinks it's alive - Setups: + Replication = 1 + # available datanodes = 2 + # disks / datanode = 1 + # failures = 1 + Failure type = bad disk + When/where failure happens = first phase of the pipeline - Details: In this experiment we have two datanodes. Each node has 1 disk. However, if one datanode has a failed disk (but the node is still alive), the datanode does not keep track of this. From the perspective of the namenode, that datanode is still alive, and thus the namenode gives back the same datanode to the client. The client will retry 3 times by asking the namenode to give a new set of datanodes, and always get the same datanode. And every time the client wants to write there, it gets an exception. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1235) Namenode returning the same Datanode to client, due to infrequent heartbeat
[ https://issues.apache.org/jira/browse/HDFS-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1235. --- Resolution: Duplicate Fixed by HDFS-630 Namenode returning the same Datanode to client, due to infrequent heartbeat --- Key: HDFS-1235 URL: https://issues.apache.org/jira/browse/HDFS-1235 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Thanh Do This bug has been reported. Basically since datanode's hearbeat messages are infrequent (~ every 10 minutes), NameNode always gives the client the same datanode even if the datanode is dead. We want to point out that the client wait 6 seconds before retrying, which could be considered long and useless retries in this scenario, because in 6 secs, the namenode hasn't declared the datanode dead. Overall this happens when a datanode is dead during the first phase of the pipeline (file setups). If a datanode is dead during the second phase (byte transfer), the DFSClient still could proceed with the other surviving datanodes (which is consistent with what Hadoop books always say -- the write should proceed if at least we have one good datanode). But unfortunately this specification is not true during the first phase of the pipeline. Overall we suggest that the namenode take into consideration the client's view of unreachable datanodes. That is, if a client says that it cannot reach DN-X, then the namenode might give the client another node other than X (but the namenode does not have to declare N dead). This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1237) Client logic for 1st phase and 2nd phase failover are different
[ https://issues.apache.org/jira/browse/HDFS-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-1237. --- Resolution: Invalid If both DNs crash in a pipeline of 2 DNs, of course the pipeline does not recover. The likelihood of correlated failure of all nodes in a pipeline is very small since one of the replicas is offrack. Please reopen if you think there's _any_ action the client could take to recover when the entire pipeline has crashed. Client logic for 1st phase and 2nd phase failover are different --- Key: HDFS-1237 URL: https://issues.apache.org/jira/browse/HDFS-1237 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.20.1 Reporter: Thanh Do - Setup: number of datanodes = 4 replication factor = 2 (2 datanodes in the pipeline) number of failure injected = 2 failure type: crash Where/When failures happen: There are two scenarios: First, is when two datanodes crash at the same time in the first phase of the pipeline. Second, when two datanodes crash at the second phase of the pipeline. - Details: In this setting, we set the datanode's heartbeat message to be 1 second to the namenode. This is just to show that if the NN has declared a datanode dead, the DFSClient will not get that dead datanode from the server. Here's our observations: 1. If the two crashes happen during the first phase, the client will wait for 6 seconds (which is enough time for NN to detect dead datanodes in this setting). So after waiting for 6 seconds, the client asks the NN again, and the NN is able to give a fresh two healthy datanodes. and the experiment is successful! 2. BUT, If the two crashes happen during the second phase (e.g. renameTo). The client *never waits for 6 secs* which implies that the logic of the client for 1st phase and 2nd phase are different. What happens here, DFSClient gives up and (we believe) it never falls back to the outer while loop to contact the NN again. So the two crashes in this second phase are not masked properly, and the write operation fails. In summary, scenario (1) is good, but scenario (2) is not successful. This shows a bad retry logic during the second phase. (We note again that we change the setup a bit by setting the DN's hearbeat interval to 1 second. If we use the default interval, scenario (1) will fail too because the NN will give the client the same dead datanodes). This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879864#action_12879864 ] Suresh Srinivas commented on HDFS-1114: --- # For figuring out 64 bit, should we consider the max heap size. If max heap size 2G consider it as 64 bit machine. Since max heap size on 32 bit machines vary, 1.4G to 2G, such machines in that range could be wrongly classified as 32 bit. Is this an alternative worth considering? # Minor: print detail to print detailed # Minor: For end of line comments should there be space after //. Java coding conventions explicitly do not talk about this though. Currently there 3043 comments with space after // and 384 without that :-) # Minor: In exceptions tests, in my previous comment, what I meant was you are better of printing to log in Assert.fail(). Printing log when expected thing happens is not that useful. That said, this is minor, you can leave it as it is. # I am not sure what the point of commenting out 5 hours test is. When do we expect it to be uncommented and run? Should it be moved to some other test that is run as smoke test for release qualification? Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1114: - Status: Open (was: Patch Available) Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-947) The namenode should redirect a hftp request to read a file to the datanode that has the maximum number of local replicas
[ https://issues.apache.org/jira/browse/HDFS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879891#action_12879891 ] Dmytro Molkov commented on HDFS-947: I ran hadoopQA locally since Hudson keeps ignoring this jira: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. The tests also ran fine. The namenode should redirect a hftp request to read a file to the datanode that has the maximum number of local replicas Key: HDFS-947 URL: https://issues.apache.org/jira/browse/HDFS-947 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: Dmytro Molkov Attachments: HDFS-947.2.patch, HDFS-947.patch, hftpRedirection.patch A client that uses the Hftp protocol to read a file is redirected by the namenode to a random datanode. It would be nice if the client gets redirected to a datanode that has the maximum number of local replicas of the blocks of the file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1114: - Attachment: h1114_20100617b.patch h1114_20100617b.patch: slightly changed the comments and removed unnecessary spaces. I did not change the capacity calculation because the current computation is conservative on the special cases. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1114: - Status: Patch Available (was: Open) Try resubmitting. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-947) The namenode should redirect a hftp request to read a file to the datanode that has the maximum number of local replicas
[ https://issues.apache.org/jira/browse/HDFS-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879895#action_12879895 ] dhruba borthakur commented on HDFS-947: --- Thanks dmytro, i will commit it. The namenode should redirect a hftp request to read a file to the datanode that has the maximum number of local replicas Key: HDFS-947 URL: https://issues.apache.org/jira/browse/HDFS-947 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: Dmytro Molkov Attachments: HDFS-947.2.patch, HDFS-947.patch, hftpRedirection.patch A client that uses the Hftp protocol to read a file is redirected by the namenode to a random datanode. It would be nice if the client gets redirected to a datanode that has the maximum number of local replicas of the blocks of the file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1241) Possible deadlock between LeaseManager and FSDirectory
Possible deadlock between LeaseManager and FSDirectory -- Key: HDFS-1241 URL: https://issues.apache.org/jira/browse/HDFS-1241 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2, 0.21.0 Reporter: Todd Lipcon LeaseManager.findPath() locks LeaseManager, then FSDirectory by caling getFileINode. FSDirectory.unprotectedDelete locks itself and then calls LeaseManager.removeLeaseWithPrefixPath. This cycle could deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1241) Possible deadlock between LeaseManager and FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1241: -- Attachment: leasemanager.png Possible deadlock between LeaseManager and FSDirectory -- Key: HDFS-1241 URL: https://issues.apache.org/jira/browse/HDFS-1241 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2, 0.21.0 Reporter: Todd Lipcon Attachments: leasemanager.png LeaseManager.findPath() locks LeaseManager, then FSDirectory by caling getFileINode. FSDirectory.unprotectedDelete locks itself and then calls LeaseManager.removeLeaseWithPrefixPath. This cycle could deadlock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-752) Add interface classification stable scope to HDFS
[ https://issues.apache.org/jira/browse/HDFS-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879947#action_12879947 ] Suresh Srinivas commented on HDFS-752: -- In my previous proposal, classes related to internal protocols and the classes used by that were marked private stable and rest were marked private unstable.I want to change the protocol classes classification to private *evolving*. Until Avro is used to ensure backward compatibility, these classes cannot be marked stable. Any class that is not tagged with interface classification is private unstable. Given that, I am not planning to add interface tagging to such classes. Tom and Sanjay let me know what you guys think. Add interface classification stable scope to HDFS --- Key: HDFS-752 URL: https://issues.apache.org/jira/browse/HDFS-752 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.21.0, 0.22.0 Attachments: hdfs.interface.txt This jira addresses adding interface classification for the classes in hadoop hdfs, based on the mechanism described in Hadoop-5073. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1206) TestFiHFlush fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1206: - Summary: TestFiHFlush fails intermittently (was: TestFiHFlush depends on BlocksMap implementation) Talked to Cos. TestFiHFlush has some known problem. TestFiHFlush fails intermittently - Key: HDFS-1206 URL: https://issues.apache.org/jira/browse/HDFS-1206 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo (Nicholas), SZE When I was testing HDFS-1114, the patch passed all tests except TestFiHFlush. Then, I tried to print out some debug messages, however, TestFiHFlush succeeded after added the messages. TestFiHFlush probably depends on the speed of BlocksMap. If BlocksMap is slow enough, then it will pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879949#action_12879949 ] Tsz Wo (Nicholas), SZE commented on HDFS-1114: -- Thanks Suresh. Hudson is not responding. Ran tests locally {noformat} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 17 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {noformat} Passed all tests except TestFiHFlush sometimes fails; see HDFS-1206. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1114: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 Resolution: Fixed I have committed this. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.22.0 Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-752) Add interface classification stable scope to HDFS
[ https://issues.apache.org/jira/browse/HDFS-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879969#action_12879969 ] Tom White commented on HDFS-752: In the common JIRA we tagged every class with public Java visibility, so I think it makes sense to do so here too. Add interface classification stable scope to HDFS --- Key: HDFS-752 URL: https://issues.apache.org/jira/browse/HDFS-752 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.21.0, 0.22.0 Attachments: hdfs.interface.txt This jira addresses adding interface classification for the classes in hadoop hdfs, based on the mechanism described in Hadoop-5073. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-752) Add interface classification stable scope to HDFS
[ https://issues.apache.org/jira/browse/HDFS-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879974#action_12879974 ] Suresh Srinivas commented on HDFS-752: -- When you found a class that was should not have been public, did you change it to private? Did you change contrib classes? Add interface classification stable scope to HDFS --- Key: HDFS-752 URL: https://issues.apache.org/jira/browse/HDFS-752 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.21.0, 0.22.0 Attachments: hdfs.interface.txt This jira addresses adding interface classification for the classes in hadoop hdfs, based on the mechanism described in Hadoop-5073. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879975#action_12879975 ] Tsz Wo (Nicholas), SZE commented on HDFS-1114: -- Ran some benchmarks. When the modulus is large, which means that number of collisions is small, LightWeightGSet is much better than GSetByHashMap. || datasize || modulus || GSetByHashMap|| LightWeightGSet|| | 65536 | 1025 | 219 | 234| | 65536 | 1048577 | 516 | 296| | 65536 | 1073741825 | 500 | 281| | 262144 | 1025 | 1422 | 1531| | 262144 | 1048577 | 3078 | 2156| | 262144 | 1073741825 | 3094 | 2281| | 1048576 | 1025 | 7172 | 7313| | 1048576 | 1048577 | 13531 | 9844| | 1048576 | 1073741825 | 14485 | 10718| Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.22.0 Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-752) Add interface classification stable scope to HDFS
[ https://issues.apache.org/jira/browse/HDFS-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879982#action_12879982 ] Tom White commented on HDFS-752: When you found a class that was should not have been public, did you change it to private? We didn't change Java visibility, we only added annotations (e.g. @InterfaceAudience.Private). Did you change contrib classes? No. Add interface classification stable scope to HDFS --- Key: HDFS-752 URL: https://issues.apache.org/jira/browse/HDFS-752 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.21.0, 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.21.0, 0.22.0 Attachments: hdfs.interface.txt This jira addresses adding interface classification for the classes in hadoop hdfs, based on the mechanism described in Hadoop-5073. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879988#action_12879988 ] Hudson commented on HDFS-1114: -- Integrated in Hadoop-Hdfs-trunk-Commit #311 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/311/]) HDFS-1114. Implement LightWeightGSet for BlocksMap in order to reduce NameNode memory footprint. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.22.0 Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879989#action_12879989 ] Tsz Wo (Nicholas), SZE commented on HDFS-1114: -- Comparing memory footprint on a 32-bit VM over 1,000,000 elements {noformat} num #instances #bytes class name -- 1: 140 24000960 java.util.HashMap$Entry 2: 100 2400 org.apache.hadoop.hdfs.util.TestGSet$IntElement 3:238390960 [Ljava.util.HashMap$Entry; HashMap: 53.78 MB num #instances #bytes class name -- 1: 100 2400 org.apache.hadoop.hdfs.util.TestGSet$IntElement 2: 14194320 [Lorg.apache.hadoop.hdfs.util.LightWeightGSet$LinkedElement; LightWeightGSet: 26.89 MB {noformat} Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.22.0 Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1114) Reducing NameNode memory usage by an alternate hash table
[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879990#action_12879990 ] Tsz Wo (Nicholas), SZE commented on HDFS-1114: -- Note that we should minus 4*100 ~= 4MB for HashMap since it does not require the reference for LightWeightGSet.LinkedElement. Reducing NameNode memory usage by an alternate hash table - Key: HDFS-1114 URL: https://issues.apache.org/jira/browse/HDFS-1114 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.22.0 Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch, h1114_20100614b.patch, h1114_20100615.patch, h1114_20100616b.patch, h1114_20100617.patch, h1114_20100617b.patch NameNode uses a java.util.HashMap to store BlockInfo objects. When there are many blocks in HDFS, this map uses a lot of memory in the NameNode. We may optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1242) 0.20 append: Add test for appendFile() race solved in HDFS-142
0.20 append: Add test for appendFile() race solved in HDFS-142 -- Key: HDFS-1242 URL: https://issues.apache.org/jira/browse/HDFS-1242 Project: Hadoop HDFS Issue Type: Test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append This is a unit test that didn't make it into branch-0.20-append, but worth having in TestFileAppend4. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1242) 0.20 append: Add test for appendFile() race solved in HDFS-142
[ https://issues.apache.org/jira/browse/HDFS-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1242: -- Attachment: hdfs-1242.txt 0.20 append: Add test for appendFile() race solved in HDFS-142 -- Key: HDFS-1242 URL: https://issues.apache.org/jira/browse/HDFS-1242 Project: Hadoop HDFS Issue Type: Test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append Attachments: hdfs-1242.txt This is a unit test that didn't make it into branch-0.20-append, but worth having in TestFileAppend4. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1243) 0.20 append: Replication tests in TestFileAppend4 should not expect immediate replication
[ https://issues.apache.org/jira/browse/HDFS-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1243: -- Attachment: hdfs-1243.txt This patch got rid of the occasional spurious failures on my hudson. 0.20 append: Replication tests in TestFileAppend4 should not expect immediate replication - Key: HDFS-1243 URL: https://issues.apache.org/jira/browse/HDFS-1243 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append Attachments: hdfs-1243.txt The replicationTest() cases in TestFileAppend4 currently assume that the file has both valid replicas immediately after completing the file. However, the datanodes may take some milliseconds to report the replica - we should only expect 1 replica (fs.replication.min) immediately after close, and we should allow up to a second or so before asserting that we reach replication 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1244) Misc improvements to TestFileAppend2
Misc improvements to TestFileAppend2 Key: HDFS-1244 URL: https://issues.apache.org/jira/browse/HDFS-1244 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append, 0.22.0 Attachments: hdfs-1244-0.20-append.txt I've made a bunch of improvements to TestFileAppend2: - Now has a main() with various command line options to change the workload (number of DNs, number of threads, etc) - Sleeps for less time in between operations to catch races around close/reopen - Updates to Junit 4 style, adds timeouts - Improves error mesages on failure -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1244) Misc improvements to TestFileAppend2
[ https://issues.apache.org/jira/browse/HDFS-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1244: -- Attachment: hdfs-1244-0.20-append.txt Misc improvements to TestFileAppend2 Key: HDFS-1244 URL: https://issues.apache.org/jira/browse/HDFS-1244 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append, 0.22.0 Attachments: hdfs-1244-0.20-append.txt I've made a bunch of improvements to TestFileAppend2: - Now has a main() with various command line options to change the workload (number of DNs, number of threads, etc) - Sleeps for less time in between operations to catch races around close/reopen - Updates to Junit 4 style, adds timeouts - Improves error mesages on failure -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1244) Misc improvements to TestFileAppend2
[ https://issues.apache.org/jira/browse/HDFS-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879997#action_12879997 ] Todd Lipcon commented on HDFS-1244: --- As an example, I have a hudson job running: HADOOP_OPTS=-ea HADOOP_ROOT_LOGGER=DEBUG,console bin/hadoop org.apache.hadoop.hdfs.TestFileAppend2 --numDataNodes 1 --numThreads 40 --appendsPerThread 2000 --numFiles 1 HADOOP_OPTS=-ea HADOOP_ROOT_LOGGER=DEBUG,console bin/hadoop org.apache.hadoop.hdfs.TestFileAppend2 --numDataNodes 2 --numThreads 40 --appendsPerThread 2000 --numFiles 1 and it's found a couple of bugs in the 20 append function. Misc improvements to TestFileAppend2 Key: HDFS-1244 URL: https://issues.apache.org/jira/browse/HDFS-1244 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append, 0.22.0 Attachments: hdfs-1244-0.20-append.txt I've made a bunch of improvements to TestFileAppend2: - Now has a main() with various command line options to change the workload (number of DNs, number of threads, etc) - Sleeps for less time in between operations to catch races around close/reopen - Updates to Junit 4 style, adds timeouts - Improves error mesages on failure -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests
[ https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880007#action_12880007 ] Dmytro Molkov commented on HDFS-599: All tests passed for me Improve Namenode robustness by prioritizing datanode heartbeats over client requests Key: HDFS-599 URL: https://issues.apache.org/jira/browse/HDFS-599 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Dmytro Molkov Fix For: 0.22.0 Attachments: HDFS-599.3.patch, HDFS-599.patch The namenode processes RPC requests from clients that are reading/writing to files as well as heartbeats/block reports from datanodes. Sometime, because of various reasons (Java GC runs, inconsistent performance of NFS filer that stores HDFS transacttion logs, etc), the namenode encounters transient slowness. For example, if the device that stores the HDFS transaction logs becomes sluggish, the Namenode's ability to process RPCs slows down to a certain extent. During this time, the RPCs from clients as well as the RPCs from datanodes suffer in similar fashion. If the underlying problem becomes worse, the NN's ability to process a heartbeat from a DN is severly impacted, thus causing the NN to declare that the DN is dead. Then the NN starts replicating blocks that used to reside on the now-declared-dead datanode. This adds extra load to the NN. Then the now-declared-datanode finally re-establishes contact with the NN, and sends a block report. The block report processing on the NN is another heavyweight activity, thus casing more load to the already overloaded namenode. My proposal is tha the NN should try its best to continue processing RPCs from datanodes and give lesser priority to serving client requests. The Datanode RPCs are integral to the consistency and performance of the Hadoop file system, and it is better to protect it at all costs. This will ensure that NN recovers from the hiccup much faster than what it does now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1245) Plugable block id generation
Plugable block id generation - Key: HDFS-1245 URL: https://issues.apache.org/jira/browse/HDFS-1245 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Reporter: Dmytro Molkov The idea is to have a way to easily create block id generation engines that may fit a certain purpose. One of them could be HDFS-898 started by Konstantin, but potentially others. We chatted with Dhruba about this for a while and came up with the following approach: There should be a BlockIDGenerator interface that has following methods: void blockAdded(Block) void blockRemoved(Block) Block nextBlock() First two methods are needed for block generation engines that hold a certain state. During the restart, when namenode reads the fsimage it will notify generator about all the blocks it reads from the image and during runtime namenode will notify the generator about block removals on file deletion. The instance of the generator will also have a reference to the block registry, the interface that BlockManager implements. The only method there is __blockExists(Block)__, so that the current random block id generation can be implemented, since it needs to check with the block manager if the id is already present. What does the community think about this proposal? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt updated HDFS-: -- Status: Patch Available (was: Open) getCorruptFiles() should give some hint that the list is not complete - Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: New Feature Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Attachments: HADFS-.0.patch If the list of corruptfiles returned by the namenode doesn't say anything if the number of corrupted files is larger than the call output limit (which means the list is not complete). There should be a way to hint incompleteness to clients. A simple hack would be to add an extra entry to the array returned with the value null. Clients could interpret this as a sign that there are other corrupt files in the system. We should also do some rephrasing of the fsck output to make it more confident when the list is not complete and less confident when the list is known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1111) getCorruptFiles() should give some hint that the list is not complete
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rodrigo Schmidt updated HDFS-: -- Status: Open (was: Patch Available) getCorruptFiles() should give some hint that the list is not complete - Key: HDFS- URL: https://issues.apache.org/jira/browse/HDFS- Project: Hadoop HDFS Issue Type: New Feature Reporter: Rodrigo Schmidt Assignee: Rodrigo Schmidt Attachments: HADFS-.0.patch If the list of corruptfiles returned by the namenode doesn't say anything if the number of corrupted files is larger than the call output limit (which means the list is not complete). There should be a way to hint incompleteness to clients. A simple hack would be to add an extra entry to the array returned with the value null. Clients could interpret this as a sign that there are other corrupt files in the system. We should also do some rephrasing of the fsck output to make it more confident when the list is not complete and less confident when the list is known to be incomplete. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1071) savenamespace should write the fsimage to all configured fs.name.dir in parallel
[ https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Molkov updated HDFS-1071: Status: Open (was: Patch Available) savenamespace should write the fsimage to all configured fs.name.dir in parallel Key: HDFS-1071 URL: https://issues.apache.org/jira/browse/HDFS-1071 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: Dmytro Molkov Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, HDFS-1071.patch If you have a large number of files in HDFS, the fsimage file is very big. When the namenode restarts, it writes a copy of the fsimage to all directories configured in fs.name.dir. This takes a long time, especially if there are many directories in fs.name.dir. Make the NN write the fsimage to all these directories in parallel. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1246) Manual tool to test sync against a real cluster
[ https://issues.apache.org/jira/browse/HDFS-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1246: -- Attachment: hdfs-1246.txt Attaching patch for branch-0.20-append. It's a bit messy code, not sure if we actually want to commit as is, but figured others may find it useful to see. This could be made into a unit test, but I was afraid of orphaning processes, etc, so it only runs manually against a real cluster right now. Manual tool to test sync against a real cluster --- Key: HDFS-1246 URL: https://issues.apache.org/jira/browse/HDFS-1246 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append Attachments: hdfs-1246.txt Contributing a tool I've built that writes data against a real cluster, calling sync as fast as it can, and then kill -9s the writer and verifies the data can be recovered. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1247) Improvements to HDFS-1204 test
Improvements to HDFS-1204 test -- Key: HDFS-1247 URL: https://issues.apache.org/jira/browse/HDFS-1247 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append The test from HDFS-1204 currently generates some warnings when compiling. Here's a small patch to clean up the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1247) Improvements to HDFS-1204 test
[ https://issues.apache.org/jira/browse/HDFS-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1247: -- Attachment: hdfs-1247.txt Patch switches to just using mockito's verification stuff rather than an Answer, and fixes compile warnings. Improvements to HDFS-1204 test -- Key: HDFS-1247 URL: https://issues.apache.org/jira/browse/HDFS-1247 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.20-append Attachments: hdfs-1247.txt The test from HDFS-1204 currently generates some warnings when compiling. Here's a small patch to clean up the test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1248) Misc cleanup/logging improvements for branch-20-append
Misc cleanup/logging improvements for branch-20-append -- Key: HDFS-1248 URL: https://issues.apache.org/jira/browse/HDFS-1248 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node, test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append Attachments: hdfs-1248.txt Last remaining bits of my append branch that didn't fit elsewhere in JIRA (just misc cleanup) - Slight cleanup to recoverFile() function in TFA4 - Improve error messages on OP_READ_BLOCK - Some comment cleanup in FSNamesystem - Remove toInodeUnderConstruction (not used) - Add some checks for null blocks to avoid NPE - Only log inconsistent size warnings at WARN level for non-under-construction blocks. - Redundant addStoredBlock calls are also not worthy of WARN level - Add some extra information to a warning in ReplicationTargetChooser This may need HDFS-1057 to be committed first to apply. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1248) Misc cleanup/logging improvements for branch-20-append
[ https://issues.apache.org/jira/browse/HDFS-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1248: -- Attachment: hdfs-1248.txt Misc cleanup/logging improvements for branch-20-append -- Key: HDFS-1248 URL: https://issues.apache.org/jira/browse/HDFS-1248 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node, test Affects Versions: 0.20-append Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.20-append Attachments: hdfs-1248.txt Last remaining bits of my append branch that didn't fit elsewhere in JIRA (just misc cleanup) - Slight cleanup to recoverFile() function in TFA4 - Improve error messages on OP_READ_BLOCK - Some comment cleanup in FSNamesystem - Remove toInodeUnderConstruction (not used) - Add some checks for null blocks to avoid NPE - Only log inconsistent size warnings at WARN level for non-under-construction blocks. - Redundant addStoredBlock calls are also not worthy of WARN level - Add some extra information to a warning in ReplicationTargetChooser This may need HDFS-1057 to be committed first to apply. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.