oh man. I didnt know there was a HadoopProxy class that actually had start and stop methods. I was starting it via Runtime.getRuntime().exec(). Thats so much nicer.
On Wed, Jun 15, 2011 at 10:41 AM, Andrei Savu <savu.and...@gmail.com> wrote: > Also the current trunk has an examples maven submodule. That code is mostly > extracted from tests. > On Jun 15, 2011 8:32 PM, "John Conwell" <j...@iamjohn.me> wrote: > > oh cool. Thanks for the pointer > > > > On Wed, Jun 15, 2011 at 10:28 AM, Tom White <tom.e.wh...@gmail.com> > wrote: > > > >> On Wed, Jun 15, 2011 at 10:18 AM, John Conwell <j...@iamjohn.me> wrote: > >> > Ok, that makes sense. Thanks for the clarification. It > >> > is definitely unwieldy when trying to integrate whirr's API into > another > >> API > >> > to wrap spinning up hadoop clusters, and getting it to work without > any > >> > manual steps. > >> > >> Agreed, but it is possible - see the Hadoop integration tests which > >> are an example of spinning up a Hadoop cluster from Java in a > >> completely automated fashion. > >> > >> Tom > >> > >> > > >> > > >> > On Tue, Jun 14, 2011 at 5:13 PM, Tom White <tom.e.wh...@gmail.com> > >> wrote: > >> >> > >> >> The proxy is not used for security (which would be better provided by > >> >> a firewall), but to make the datanode addresses resolve correctly for > >> >> the client. Without the proxy the datanodes return their internal > >> >> addresses which are not routable by the client (which runs in an > >> >> external network typically). > >> >> > >> >> I agree that it would be better if we could replace the proxy with > >> >> something better, such as > >> >> https://issues.apache.org/jira/browse/WHIRR-81. > >> >> > >> >> On Tue, Jun 14, 2011 at 9:26 AM, John Conwell <j...@iamjohn.me> > wrote: > >> >> > I get the whole "security is a good thing" thing, but could someone > >> give > >> >> > me > >> >> > a description as to why when whirr configures hadoop it sets up the > >> ssh > >> >> > proxy to disallow all coms to the data / task nodes except via the > >> name > >> >> > node > >> >> > over the proxy? If I'm running on EC2, wont correctly setting up > >> >> > security > >> >> > groups give me enough security? > >> >> > The reason I ask is that I'm using Whirr through its API to > >> >> > automate...well...all the cool things whirr does. But they key > point > >> is > >> >> > automation. After a hadoop cluster is up and running I'd like the > >> >> > program > >> >> > to kick off a hadoop job, monitor jobs and tasks. But that means my > >> >> > program > >> >> > has to launch hadoop-proxy.sh somehow, capture the PID of the > process, > >> >> > kick > >> >> > off my hadoop job, then when done, kill the process via the PID. > The > >> >> > whole > >> >> > calling a shell script, capturing the PID, persisting it, and > killing > >> it > >> >> > all > >> >> > through my java automation just seems a bit duct-tape and > >> >> > bailing-wire'ish. > >> >> > >> >> You can run the proxy from Java via HadoopProxy, which handles all > >> >> these details for you. > >> >> > >> >> > > >> >> > So I'm trying to figure out why we have the whole hadoop-proxy.sh > >> thing > >> >> > in > >> >> > the first place (specifically within the context of EC2) > >> >> > > >> >> > -- > >> >> > > >> >> > Thanks, > >> >> > John C > >> >> > > >> >> > >> >> Cheers, > >> >> Tom > >> > > >> > > >> > > >> > -- > >> > > >> > Thanks, > >> > John C > >> > > >> > > > > > > > > -- > > > > Thanks, > > John C > -- Thanks, John C