On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt <ashish.du...@gmail.com> wrote:

> Hi,
> We have a cluster with 4 nodes. The cluster uses CDH 5.4 for the past two
> days I have been trying to connect my laptop to the server using spark
> <master ip:port> but its been unsucessful.
> The server contains data that needs to be cleaned and analysed.
> The cluster and the nodes are on linux environment.
> To connect to the nodes I am usnig SSH
> Question: Would it be better if I work directly on the nodes rather than
> trying to connect my laptop to them ?

​-> You will be able to connect to master machine in the cloud from your

​, but you need to make sure that the master is able to connect back to
your laptop (may require port forwarding on your router, firewalls etc.)

> Question 2: If yes, then can you suggest any python and R IDE that I can
> install on the nodes to make it work?

​-> Once the master machine is able to connect to your laptop's public ip,
then you can set the spark.driver.host and spark.driver.port properties and
your job will get executed on the cluster.

> Thanks for your help
> Sincerely,
> Ashish Dutt

Reply via email to