Re: EC2 node submit jobs to separate Spark Cluster

Aaron Davidson Tue, 19 Nov 2013 11:19:58 -0800

I think the driver binds to a random port by default, but this can be
changed using the "spark.driver.port" system property. So you should be
able to set that property and open only that port. See:
http://spark.incubator.apache.org/docs/latest/configuration.html.



On Tue, Nov 19, 2013 at 10:16 AM, Matt Cheah <[email protected]> wrote:

>  I determined that it is a firewall issue. Allowing "all traffic" to the
> cluster where the shell was running hack-fixed the problem.
>
>  That being said, what ports do I have to open to allow the spark master
> to communicate back to the driver? I've heard this is required. And
> obviously allowing all traffic is bad…
>
>  -Matt Cheah
>
>   From: Aaron Davidson <[email protected]>
> Reply-To: "[email protected]" <
> [email protected]>
> Date: Monday, November 18, 2013 8:28 PM
> To: "[email protected]" <[email protected]>
> Subject: Re: EC2 node submit jobs to separate Spark Cluster
>
>   The main issue with running a spark-shell locally is that it
> orchestrates the actual computation, so you want it to be "close" to the
> actual Worker nodes for latency reasons. Running a spark-shell on EC2 in
> the same region as the Spark cluster avoids this problem.
>
>  The error you're seeing seems to indicate a different issue. Check the
> Master web UI (accessible on port 8080 at the master's IP address) to make
> sure that Workers are successfully registered and they have the expected
> amount of memory available to Spark. You can also check to see how much
> memory your spark-shell is trying to get per executor. A couple common
> problems are (1) an abandoned spark-shell is holding onto all of your
> cluster's resources or (2) you've manually configured your spark-shell to
> try to get more memory than your Workers have available. Both of these
> should be visible in the web UI.
>
>
> On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah <[email protected]> wrote:
>
>>  Hi,
>>
>>  I'm working with an infrastructure that already has its own web server
>> set up on EC2. I would like to set up a *separate* spark cluster on EC2
>> with the scripts and have the web server submit jobs to this spark cluster.
>>
>>  Is it possible to do this? I'm getting some errors running the spark
>> shell from the spark shell on the web server: "Initial job has not accepted
>> any resources; check your cluster UI to ensure that workers are registered
>> and have sufficient memory". I have heard that it's not possible for any
>> local computer to connect to the spark cluster, but I was wondering if
>> other EC2 nodes could have their firewalls configured to allow this.
>>
>>  We don't want to deploy the web server on the master node of the spark
>> cluster.
>>
>>  Thanks,
>>
>>  -Matt Cheah
>>
>>
>>
>

Re: EC2 node submit jobs to separate Spark Cluster

Reply via email to