I have a Hadoop cluster that's a little tight on resources. I was
thinking one way I could solve this could be by running an additional
data node on the same machine as the secondary name node.
I wouldn't dare do that on the primary name node, since that machine
needs to be extremely performant. But since all the secondary name node
does is doing a merge of the name node's checkpoint and logs, which is
not an activity that require top-notch real-time performance, I thought
it might not be a problem if I were to set up a data node running there
as well.
Any reasons why that might be a bad idea?
Thanks,
DR