Re: hadoop security and ssh proxy

John Conwell Wed, 15 Jun 2011 11:17:23 -0700

oh man.  I didnt know there was a HadoopProxy class that actually had start
and stop methods.  I was starting it via Runtime.getRuntime().exec().  Thats
so much nicer.


On Wed, Jun 15, 2011 at 10:41 AM, Andrei Savu <savu.and...@gmail.com> wrote:

> Also the current trunk has an examples maven submodule. That code is mostly
> extracted from tests.
> On Jun 15, 2011 8:32 PM, "John Conwell" <j...@iamjohn.me> wrote:
> > oh cool. Thanks for the pointer
> >
> > On Wed, Jun 15, 2011 at 10:28 AM, Tom White <tom.e.wh...@gmail.com>
> wrote:
> >
> >> On Wed, Jun 15, 2011 at 10:18 AM, John Conwell <j...@iamjohn.me> wrote:
> >> > Ok, that makes sense. Thanks for the clarification. It
> >> > is definitely unwieldy when trying to integrate whirr's API into
> another
> >> API
> >> > to wrap spinning up hadoop clusters, and getting it to work without
> any
> >> > manual steps.
> >>
> >> Agreed, but it is possible - see the Hadoop integration tests which
> >> are an example of spinning up a Hadoop cluster from Java in a
> >> completely automated fashion.
> >>
> >> Tom
> >>
> >> >
> >> >
> >> > On Tue, Jun 14, 2011 at 5:13 PM, Tom White <tom.e.wh...@gmail.com>
> >> wrote:
> >> >>
> >> >> The proxy is not used for security (which would be better provided by
> >> >> a firewall), but to make the datanode addresses resolve correctly for
> >> >> the client. Without the proxy the datanodes return their internal
> >> >> addresses which are not routable by the client (which runs in an
> >> >> external network typically).
> >> >>
> >> >> I agree that it would be better if we could replace the proxy with
> >> >> something better, such as
> >> >> https://issues.apache.org/jira/browse/WHIRR-81.
> >> >>
> >> >> On Tue, Jun 14, 2011 at 9:26 AM, John Conwell <j...@iamjohn.me>
> wrote:
> >> >> > I get the whole "security is a good thing" thing, but could someone
> >> give
> >> >> > me
> >> >> > a description as to why when whirr configures hadoop it sets up the
> >> ssh
> >> >> > proxy to disallow all coms to the data / task nodes except via the
> >> name
> >> >> > node
> >> >> > over the proxy? If I'm running on EC2, wont correctly setting up
> >> >> > security
> >> >> > groups give me enough security?
> >> >> > The reason I ask is that I'm using Whirr through its API to
> >> >> > automate...well...all the cool things whirr does. But they key
> point
> >> is
> >> >> > automation. After a hadoop cluster is up and running I'd like the
> >> >> > program
> >> >> > to kick off a hadoop job, monitor jobs and tasks. But that means my
> >> >> > program
> >> >> > has to launch hadoop-proxy.sh somehow, capture the PID of the
> process,
> >> >> > kick
> >> >> > off my hadoop job, then when done, kill the process via the PID.
> The
> >> >> > whole
> >> >> > calling a shell script, capturing the PID, persisting it, and
> killing
> >> it
> >> >> > all
> >> >> > through my java automation just seems a bit duct-tape and
> >> >> > bailing-wire'ish.
> >> >>
> >> >> You can run the proxy from Java via HadoopProxy, which handles all
> >> >> these details for you.
> >> >>
> >> >> >
> >> >> > So I'm trying to figure out why we have the whole hadoop-proxy.sh
> >> thing
> >> >> > in
> >> >> > the first place (specifically within the context of EC2)
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Thanks,
> >> >> > John C
> >> >> >
> >> >>
> >> >> Cheers,
> >> >> Tom
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Thanks,
> >> > John C
> >> >
> >>
> >
> >
> >
> > --
> >
> > Thanks,
> > John C
>



-- 

Thanks,
John C

Re: hadoop security and ssh proxy

Reply via email to