On Wed, Jun 15, 2011 at 10:18 AM, John Conwell <j...@iamjohn.me> wrote: > Ok, that makes sense. Thanks for the clarification. It > is definitely unwieldy when trying to integrate whirr's API into another API > to wrap spinning up hadoop clusters, and getting it to work without any > manual steps.
Agreed, but it is possible - see the Hadoop integration tests which are an example of spinning up a Hadoop cluster from Java in a completely automated fashion. Tom > > > On Tue, Jun 14, 2011 at 5:13 PM, Tom White <tom.e.wh...@gmail.com> wrote: >> >> The proxy is not used for security (which would be better provided by >> a firewall), but to make the datanode addresses resolve correctly for >> the client. Without the proxy the datanodes return their internal >> addresses which are not routable by the client (which runs in an >> external network typically). >> >> I agree that it would be better if we could replace the proxy with >> something better, such as >> https://issues.apache.org/jira/browse/WHIRR-81. >> >> On Tue, Jun 14, 2011 at 9:26 AM, John Conwell <j...@iamjohn.me> wrote: >> > I get the whole "security is a good thing" thing, but could someone give >> > me >> > a description as to why when whirr configures hadoop it sets up the ssh >> > proxy to disallow all coms to the data / task nodes except via the name >> > node >> > over the proxy? If I'm running on EC2, wont correctly setting up >> > security >> > groups give me enough security? >> > The reason I ask is that I'm using Whirr through its API to >> > automate...well...all the cool things whirr does. But they key point is >> > automation. After a hadoop cluster is up and running I'd like the >> > program >> > to kick off a hadoop job, monitor jobs and tasks. But that means my >> > program >> > has to launch hadoop-proxy.sh somehow, capture the PID of the process, >> > kick >> > off my hadoop job, then when done, kill the process via the PID. The >> > whole >> > calling a shell script, capturing the PID, persisting it, and killing it >> > all >> > through my java automation just seems a bit duct-tape and >> > bailing-wire'ish. >> >> You can run the proxy from Java via HadoopProxy, which handles all >> these details for you. >> >> > >> > So I'm trying to figure out why we have the whole hadoop-proxy.sh thing >> > in >> > the first place (specifically within the context of EC2) >> > >> > -- >> > >> > Thanks, >> > John C >> > >> >> Cheers, >> Tom > > > > -- > > Thanks, > John C >