Hello, today I tried to run a new crunch job on our production cluster and it crashed with a NumberFormatException in file CrunchCombineFileInputFormat <https://github.com/apache/crunch/blob/apache-crunch-0.12/crunch-core/src/main/java/org/apache/crunch/impl/mr/run/CrunchCombineFileInputFormat.java> at line 38 where the configuration is queried for a key named "dfs.block.size". The crash I experienced was caused by the fact that I used a non-long value "128m" as the default block size. I looked at the official hadoop documentation <https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml> and I discovered that such abbreviations should be supported.
Furthermore, the preferred name of the configuration key changed from "dfs.block.size" to "dfs.blocksize" (see https://issues.apache.org/jira/browse/HDFS-631). The default value in the source code hides the fact that the configuration key is not found in the conf in more recent hadoop versions. I am using crunch 0.11 that comes with CDH 5.4.5. The problematic line first appeared in crunch 0.8 as a fix to issue CRUNCH-253. The issue can be fixed by taking both config key names into account and writing some logic to support block size abbreviations. Or, as a workaround, file size abbreviations can be avoided on the client. Do you think that this should be fixed or is the workaround enough? What should I do if I want to submit a patch? Thanks, Tomas Cechal
