[ https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
AMC-team updated HDFS-15440: ---------------------------- Summary: The doc of dfs.disk.balancer.block.tolerance.percent is misleading (was: The using of dfs.disk.balancer.block.tolerance.percent is inconsistent with doc) > The doc of dfs.disk.balancer.block.tolerance.percent is misleading > ------------------------------------------------------------------ > > Key: HDFS-15440 > URL: https://issues.apache.org/jira/browse/HDFS-15440 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Reporter: AMC-team > Priority: Major > > In HDFS disk balancer, configuration parameter > "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 > means 10%) which defines a good enough move. > The description in hdfs-default.xml is not so clear to me how the value > actually calculates and works > {quote}When a disk balancer copy operation is proceeding, the datanode is > still active. So it might not be possible to move the exactly specified > amount of data. So tolerance allows us to define a percentage which defines a > good enough move. > {quote} > So I refer to the [official doc of HDFS disk > balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html] > and the description is: > {quote}The tolerance percent specifies when we have reached a good enough > value for any copy step. For example, if you specify 10 then getting close to > 10% of the target value is good enough. It is to say if the move operation is > 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered > successful. > {quote} > However from the source code in DiskBalancer.java > {code:java} > // Inflates bytesCopied and returns true or false. This allows us to stop > // copying if we have reached close enough. > private boolean isCloseEnough(DiskBalancerWorkItem item) { > long temp = item.getBytesCopied() + > ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100); > return (item.getBytesToCopy() >= temp) ? false : true; > } > {code} > Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is > still not enough because 20 > 18 + 18*0.1 > The calculation in isLessThanNeeded() (Checks if a given block is less than > needed size to meet our goal.) is also not intuitive in the same way. > *How to fix* > Although this may not lead severe failure, it is better to make it consistent > between doc and code, and also better to refine the description in > hdfs-default.xml to make it more precise and clear. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org