On Wed, Jun 15, 2011 at 10:18 AM, John Conwell <j...@iamjohn.me> wrote:
> Ok, that makes sense.  Thanks for the clarification.  It
> is definitely unwieldy when trying to integrate whirr's API into another API
> to wrap spinning up hadoop clusters, and getting it to work without any
> manual steps.

Agreed, but it is possible - see the Hadoop integration tests which
are an example of spinning up a Hadoop cluster from Java in a
completely automated fashion.

Tom

>
>
> On Tue, Jun 14, 2011 at 5:13 PM, Tom White <tom.e.wh...@gmail.com> wrote:
>>
>> The proxy is not used for security (which would be better provided by
>> a firewall), but to make the datanode addresses resolve correctly for
>> the client. Without the proxy the datanodes return their internal
>> addresses which are not routable by the client (which runs in an
>> external network typically).
>>
>> I agree that it would be better if we could replace the proxy with
>> something better, such as
>> https://issues.apache.org/jira/browse/WHIRR-81.
>>
>> On Tue, Jun 14, 2011 at 9:26 AM, John Conwell <j...@iamjohn.me> wrote:
>> > I get the whole "security is a good thing" thing, but could someone give
>> > me
>> > a description as to why when whirr configures hadoop it sets up the ssh
>> > proxy to disallow all coms to the data / task nodes except via the name
>> > node
>> > over the proxy?  If I'm running on EC2, wont correctly setting up
>> > security
>> > groups give me enough security?
>> > The reason I ask is that I'm using Whirr through its API to
>> > automate...well...all the cool things whirr does.  But they key point is
>> > automation.  After a hadoop cluster is up and running I'd like the
>> > program
>> > to kick off a hadoop job, monitor jobs and tasks.  But that means my
>> > program
>> > has to launch hadoop-proxy.sh somehow, capture the PID of the process,
>> > kick
>> > off my hadoop job, then when done, kill the process via the PID.  The
>> > whole
>> > calling a shell script, capturing the PID, persisting it, and killing it
>> > all
>> > through my java automation just seems a bit duct-tape and
>> > bailing-wire'ish.
>>
>> You can run the proxy from Java via HadoopProxy, which handles all
>> these details for you.
>>
>> >
>> > So I'm trying to figure out why we have the whole hadoop-proxy.sh thing
>> > in
>> > the first place (specifically within the context of EC2)
>> >
>> > --
>> >
>> > Thanks,
>> > John C
>> >
>>
>> Cheers,
>> Tom
>
>
>
> --
>
> Thanks,
> John C
>

Reply via email to