I get the whole "security is a good thing" thing, but could someone give me
a description as to why when whirr configures hadoop it sets up the ssh
proxy to disallow all coms to the data / task nodes except via the name node
over the proxy?  If I'm running on EC2, wont correctly setting up security
groups give me enough security?

The reason I ask is that I'm using Whirr through its API to
automate...well...all the cool things whirr does.  But they key point is
automation.  After a hadoop cluster is up and running I'd like the program
to kick off a hadoop job, monitor jobs and tasks.  But that means my program
has to launch hadoop-proxy.sh somehow, capture the PID of the process, kick
off my hadoop job, then when done, kill the process via the PID.  The whole
calling a shell script, capturing the PID, persisting it, and killing it all
through my java automation just seems a bit duct-tape and bailing-wire'ish.


So I'm trying to figure out why we have the whole hadoop-proxy.sh thing in
the first place (specifically within the context of EC2)

-- 

Thanks,
John C

Reply via email to