Hi Tim,
Basically, if an user still wants to use Spark 1.x, he would just be "stuck"
with Beam 2.2.0.
I would like to see a Beam 2.3.0 end of December/beginning of January with Spark
2.x support (exclusive or with 1.x).
The goal of the discussion is just to know if it's worth to maintain Spark 1.x
and 2.x or if I can do cleanup to support only 2.x ;)
Regards
JB
On 11/13/2017 10:56 AM, Tim Robertson wrote:
Thanks JB
On "thoughts":
- Cloudera 5.13 will still default to 1.6 even though a 2.2 parcel is available
(HWX provides both)
- Cloudera support for spark 2 has a list of exceptions
(https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html)
- I am not sure if the HBaseIO would be affected
- I am not sure if structured streaming would have implications
- it might stop customers from being able to run spark 2 at all due to
support agreements
- Spark 2.3 (EOY) will drop Scala 2.10 support
- IBM's now defunct distro only has 1.6
- Oozie doesn't have a spark 2 action (need to use a shell action)
- There are a lot of folks with code running on 1.3,1.4 and 1.5 still
- Spark 2.2+ requires Java 8 too, while <2.2 was J7 like Beam (not sure if this
has other implications for the cross deployment nature of Beam)
My first impressions of Beam was really favourable as it all just worked first
time on a CDH Spark 1.6 cluster. For us it is lacking resources to refactor
legacy code which delays the 2.2 push.
With that said I think is it very reasonable to have a clear cut off in Beam,
especially if it limits progress / causes headaches in packaging, robustness
etc. I'd recommend putting it in a 6 month timeframe which might align with 2.3?
Hope this helps,
Tim
On Mon, Nov 13, 2017 at 10:07 AM, Neville Dipale <[email protected]
<mailto:[email protected]>> wrote:
Hi JB,
[X ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
[ ] 0 (I don't care ;))
[ ] -1, I would like to still support Spark 1.x, and so having support of
both Spark 1.x and 2.x (please provide specific comment)
On 13 November 2017 at 10:32, Jean-Baptiste Onofré <[email protected]
<mailto:[email protected]>> wrote:
Hi Beamers,
I'm forwarding this discussion & vote from the dev mailing list to the
user mailing list.
The goal is to have your feedback as user.
Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three
artifacts (common, spark1, spark2). You, as users, pick up spark1 or
spark2 in your dependencies set depending the Spark target version you
want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0.
If you still want to use Spark 1.x, then, you will be stuck up to Beam
2.2.0.
Thoughts ?
Thanks !
Regards
JB
-------- Forwarded Message --------
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré <[email protected]
<mailto:[email protected]>>
Reply-To: [email protected] <mailto:[email protected]>
To: [email protected] <mailto:[email protected]>
Hi all,
as you might know, we are working on Spark 2.x support in the Spark
runner.
I'm working on a PR about that:
https://github.com/apache/beam/pull/3808
<https://github.com/apache/beam/pull/3808>
Today, we have something working with both Spark 1.x and 2.x from a code
standpoint, but I have to deal with dependencies. It's the first step of
the update as I'm still using RDD, the second step would be to support
dataframe (but for that, I would need PCollection elements with schemas,
that's another topic on which Eugene, Reuven and I are discussing).
However, as all major distributions now ship Spark 2.x, I don't think
it's required anymore to support Spark 1.x.
If we agree, I will update and cleanup the PR to only support and focus
on Spark 2.x.
So, that's why I'm calling for a vote:
[ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
[ ] 0 (I don't care ;))
[ ] -1, I would like to still support Spark 1.x, and so having
support of both Spark 1.x and 2.x (please provide specific comment)
This vote is open for 48 hours (I have the commits ready, just waiting
the end of the vote to push on the PR).
Thanks !
Regards
JB
--
Jean-Baptiste Onofré
[email protected] <mailto:[email protected]>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com