Hi,
To get to the point: Does the number of replicas of a block increases the memory requirement on NameNode, and by how much? The calculation in this paper https://www.usenix.org/legacy/publications/login/2010-04/openpdfs/shvachko.p df from Yahoo! assumes 200 bytes per metadata object, and with 1.5 block per file, it needs 3 object (1 for the file, 2 for the blocks). The replication factor is not mentioned in the paper and doesn't participate in the calculation. This email https://www.mail-archive.com/[email protected]/msg02835.html in the mailing list assumes 150 bytes per metadata object, but it messed up the calculation by an order of magnitude, since 1M files (1 block each) will use 2M metadata objects (1 for file, 1 for block), which results in 300MB, not 3GB. This article http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ from Cloudera cites the mail, but corrects the number to match the figure. The replication factor is not mentioned in both cases and does not participate in the calculation. This answer on StackOverflow https://stackoverflow.com/questions/10764493/namenode-file-quantity-limit adds two metadata object (for file and for block) for each replication, which does not match the method of calculation from the links above. Which one(s) of them is/are correct? Does replication use one metadata object per block replica, or only a slight increase in the size of the metadata object? Best regards, Hong Dai Thanh
