Re: Apache Spark Docker image repository

2021-03-03 Thread Ismaël Mejía
Since Spark 3.1.1 is out now I was wondering if it would make sense to try to get some consensus about starting to release docker images as part of Spark 3.2. Having ready to use images would definitely benefit adoption in particular now that we support containerized runs via k8s became GA. WDYT?

Re: Apache Spark Docker image repository

2020-02-18 Thread Ismaël Mejía
+1 to have Spark docker images for Dongjoon's arguments, having a container based distribution is definitely something in the benefit of users and the project too. Having this in the Apache Spark repo matters because of multiple eyes to fix/ímprove the images for the benefit of everyone. What

Re: Apache Spark Docker image repository

2020-02-11 Thread Dongjoon Hyun
Hi, Sean. Yes. We should keep this minimal. BTW, for the following questions, > But how much value does that add? How much value do you think we have at our binary distribution in the following link? - https://www.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz Docker

Re: Apache Spark Docker image repository

2020-02-11 Thread Sean Owen
To be clear this is a convenience 'binary' for end users, not just an internal packaging to aid the testing framework? There's nothing wrong with providing an additional official packaging if we vote on it and it follows all the rules. There is an open question about how much value it adds vs

Re: Apache Spark Docker image repository

2020-02-11 Thread Erik Erlandson
My takeaway from the last time we discussed this was: 1) To be ASF compliant, we needed to only publish images at official releases 2) There was some ambiguity about whether or not a container image that included GPL'ed packages (spark images do) might trip over the GPL "viral propagation" due to

Re: Apache Spark Docker image repository

2020-02-10 Thread Dongjoon Hyun
Thank you, Hyukjin. The maintenance overhead only occurs when we add a new release. And, we can prevent accidental upstream changes by avoiding 'latest' tags. The overhead will be much smaller than our exisitng Dockerfile maintenance (e.g. 'spark-rm') Also, if we have a docker repository, we

Re: Apache Spark Docker image repository

2020-02-10 Thread Hyukjin Kwon
Quick question. Roughly how much overhead is it required to maintain minimal version? If that looks not too much, I think it's fine to give a shot. 2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun 님이 작성: > Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. > > 1. For legal questions, please see the

Re: Apache Spark Docker image repository

2020-02-07 Thread Dongjoon Hyun
Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. 1. For legal questions, please see the following three Apache-approved approaches. We can follow one of them. 1. https://hub.docker.com/u/apache (93 repositories, Airflow/NiFi/Beam/Druid/Zeppelin/Hadoop/...) 2.

Re: Apache Spark Docker image repository

2020-02-06 Thread Tom Graves
When discussions of docker have occurred in the past - mostly related to k8s - there is a lot of discussion about what is the right image to publish, as well as making sure Apache is ok with it. Apache official release is the source code so we may need to make sure to have disclaimer and we

Re: Apache Spark Docker image repository

2020-02-06 Thread Maciej Szymkiewicz
On 2/6/20 2:53 AM, Jiaxin Shan wrote: > I will vote for this. It's pretty helpful to have managed Spark > images. Currently, user have to download Spark binaries and build > their own.  > With this supported, user journey will be simplified and we only need > to build an application image on top

Re: Apache Spark Docker image repository

2020-02-05 Thread shane knapp ☠
> > (This can be used in GitHub Action Jobs and Jenkins K8s > Integration Tests to speed up jobs and to have more stabler environments) > yep! not only that, if we ever get around (hopefully this year) to containerizing (the majority) the master and branch builds, i think it'd be nice to

Re: Apache Spark Docker image repository

2020-02-05 Thread Jiaxin Shan
I will vote for this. It's pretty helpful to have managed Spark images. Currently, user have to download Spark binaries and build their own. With this supported, user journey will be simplified and we only need to build an application image on top of base image provided by community. Do we have

Re: Apache Spark Docker image repository

2020-02-05 Thread Sean Owen
What would the images have - just the image for a worker? We wouldn't want to publish N permutations of Python, R, OS, Java, etc. But if we don't then we make one or a few choices of that combo, and then I wonder how many people find the image useful. If the goal is just to support Spark testing,

Apache Spark Docker image repository

2020-02-05 Thread Dongjoon Hyun
Hi, All. >From 2020, shall we have an official Docker image repository as an additional distribution channel? I'm considering the following images. - Public binary release (no snapshot image) - Public non-Spark base image (OS + R + Python) (This can be used in GitHub Action Jobs