Re: Installing Distributed apache spark cluster with Cluster mode on Docker

Khalid Mammadov Sat, 24 Jul 2021 08:29:33 -0700

Can you share your Dockerfile (not all but gist of it) and instructions how
you do it and what you actually run to get that message?


I have just pushed my local repo to Github where I have created an example
of Spark on Docker some time ago.
Please take a look and compare what you are doing.

https://github.com/khalidmammadov/spark_docker


On Sat, Jul 24, 2021 at 4:07 PM Dinakar Chennubotla <
chennu.bigd...@gmail.com> wrote:

> Hi Khalid Mammadov,
>
> I tried the which says distributed mode Spark installation. But when run
> below command it says " deployment mode = cluster is not allowed in
> standalone cluster".
>
> Source Url I used is:
>
> https://towardsdatascience.com/diy-apache-spark-docker-bb4f11c10d24?gi=fa52ac767c0b
>
> Kiddly refer this section in the url I mentioned.
> "Docker & Spark — Multiple Machines"
>
> I removed third party things and dockerized my way.
>
> Thanks,
> Dinakar
>
> On Sat, 24 Jul, 2021, 20:28 Khalid Mammadov, <khalidmammad...@gmail.com>
> wrote:
>
>> Standalone mode already implies you are running on cluster (distributed)
>> mode. i.e. it's one of 4 available cluster manager options. The difference
>> is Standalone uses it's one resource manager rather than using YARN for
>> example.
>> If you are running docker on a single machine then you are limited to
>> that but if you run your docker on a cluster and deploy your Spark
>> containers on it then you will get your distribution and cluster mode.
>> And also If you are referring to scalability then you need to register
>> worker nodes when you need to scale.
>> You do it by registering a VM/container as a worker node as per doc using:
>>
>> ./sbin/start-worker.sh <master-spark-URL>
>>
>> You can create a new docker container with your base image and run the above 
>> command on the bootstrap and that would register a worker node and scale 
>> your cluster when you want.
>>
>> And if you kill them then you would scale down ( I think this is how 
>> Databricks autoscaling works..). I am not sure k8s TBH, perhaps it's handled 
>> this more gracefully
>>
>>
>> On Sat, Jul 24, 2021 at 3:38 PM Dinakar Chennubotla <
>> chennu.bigd...@gmail.com> wrote:
>>
>>> Hi Khalid Mammadov,
>>>
>>> Thank you for your response,
>>> Yes, I did, I built standalone apache spark cluster on docker containers.
>>>
>>> But I am looking for distributed spark cluster,
>>> Where spark workers are scalable and spark "deployment mode  = cluster".
>>>
>>> Source url I used to built standalone apache spark cluster
>>> https://www.kdnuggets.com/2020/07/apache-spark-cluster-docker.html
>>>
>>> If you have documentation on distributed spark, which I am looking for,
>>> could you please send me.
>>>
>>>
>>> Thanks,
>>> Dinakar
>>>
>>> On Sat, 24 Jul, 2021, 19:32 Khalid Mammadov, <khalidmammad...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Have you checked out docs?
>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>
>>>> Thanks,
>>>> Khalid
>>>>
>>>> On Sat, Jul 24, 2021 at 1:45 PM Dinakar Chennubotla <
>>>> chennu.bigd...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am Dinakar, Hadoop admin,
>>>>> could someone help me here,
>>>>>
>>>>> 1. I have a DEV-POC task to do,
>>>>> 2. Need to Installing Distributed apache-spark cluster with Cluster
>>>>> mode on Docker containers.
>>>>> 3. with Scalable spark-worker containers.
>>>>> 4. we have a 9 node cluster with some other services or tools.
>>>>>
>>>>> Thanks,
>>>>> Dinakar
>>>>>
>>>>

Re: Installing Distributed apache spark cluster with Cluster mode on Docker

Reply via email to