Hello,
I'm in doubt should I specify the block size to be smaller than 64MB in case
that my mappers need to do intensive computations?
I know that it is better to have larger files, since the replication and
NameNode as a weak point, but I'm don't have that much data, but the
operations that need to be performed on it are intensive.
It looks like it's better to have smaller block size (at least until there is
more data) so that multiple Mappers get instantiated, so they could share the
computations.
I'm currently talking about Hadoop 1, not YARN. But a heads up about the same
problem with YARN will be appreciated.
Thanks,
Marko
Sent with [inky](http://inky.com?kme=signature)