I get the whole "security is a good thing" thing, but could someone give me a description as to why when whirr configures hadoop it sets up the ssh proxy to disallow all coms to the data / task nodes except via the name node over the proxy? If I'm running on EC2, wont correctly setting up security groups give me enough security?
The reason I ask is that I'm using Whirr through its API to automate...well...all the cool things whirr does. But they key point is automation. After a hadoop cluster is up and running I'd like the program to kick off a hadoop job, monitor jobs and tasks. But that means my program has to launch hadoop-proxy.sh somehow, capture the PID of the process, kick off my hadoop job, then when done, kill the process via the PID. The whole calling a shell script, capturing the PID, persisting it, and killing it all through my java automation just seems a bit duct-tape and bailing-wire'ish. So I'm trying to figure out why we have the whole hadoop-proxy.sh thing in the first place (specifically within the context of EC2) -- Thanks, John C