Since Spark 3.1.1 is out now I was wondering if it would make sense to
try to get some consensus about starting to release docker images as
part of Spark 3.2.
Having ready to use images would definitely benefit adoption in
particular now that we support containerized runs via k8s became GA.
WDYT?
+1 to have Spark docker images for Dongjoon's arguments, having a container
based distribution is definitely something in the benefit of users and the
project too. Having this in the Apache Spark repo matters because of
multiple
eyes to fix/ímprove the images for the benefit of everyone.
What
Hi, Sean.
Yes. We should keep this minimal.
BTW, for the following questions,
> But how much value does that add?
How much value do you think we have at our binary distribution in the
following link?
-
https://www.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
Docker
To be clear this is a convenience 'binary' for end users, not just an
internal packaging to aid the testing framework?
There's nothing wrong with providing an additional official packaging
if we vote on it and it follows all the rules. There is an open
question about how much value it adds vs
My takeaway from the last time we discussed this was:
1) To be ASF compliant, we needed to only publish images at official
releases
2) There was some ambiguity about whether or not a container image that
included GPL'ed packages (spark images do) might trip over the GPL "viral
propagation" due to
Thank you, Hyukjin.
The maintenance overhead only occurs when we add a new release.
And, we can prevent accidental upstream changes by avoiding 'latest' tags.
The overhead will be much smaller than our exisitng Dockerfile maintenance
(e.g. 'spark-rm')
Also, if we have a docker repository, we
Quick question. Roughly how much overhead is it required to maintain
minimal version?
If that looks not too much, I think it's fine to give a shot.
2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun 님이 작성:
> Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks.
>
> 1. For legal questions, please see the
Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks.
1. For legal questions, please see the following three Apache-approved
approaches. We can follow one of them.
1. https://hub.docker.com/u/apache (93 repositories,
Airflow/NiFi/Beam/Druid/Zeppelin/Hadoop/...)
2.
When discussions of docker have occurred in the past - mostly related to k8s -
there is a lot of discussion about what is the right image to publish, as well
as making sure Apache is ok with it. Apache official release is the source code
so we may need to make sure to have disclaimer and we
On 2/6/20 2:53 AM, Jiaxin Shan wrote:
> I will vote for this. It's pretty helpful to have managed Spark
> images. Currently, user have to download Spark binaries and build
> their own.
> With this supported, user journey will be simplified and we only need
> to build an application image on top
>
> (This can be used in GitHub Action Jobs and Jenkins K8s
> Integration Tests to speed up jobs and to have more stabler environments)
>
yep!
not only that, if we ever get around (hopefully this year) to
containerizing (the majority) the master and branch builds, i think it'd be
nice to
I will vote for this. It's pretty helpful to have managed Spark images.
Currently, user have to download Spark binaries and build their own.
With this supported, user journey will be simplified and we only need to
build an application image on top of base image provided by community.
Do we have
What would the images have - just the image for a worker?
We wouldn't want to publish N permutations of Python, R, OS, Java, etc.
But if we don't then we make one or a few choices of that combo, and
then I wonder how many people find the image useful.
If the goal is just to support Spark testing,
Hi, All.
>From 2020, shall we have an official Docker image repository as an
additional distribution channel?
I'm considering the following images.
- Public binary release (no snapshot image)
- Public non-Spark base image (OS + R + Python)
(This can be used in GitHub Action Jobs
14 matches
Mail list logo