Hi Tim,

Basically, if an user still wants to use Spark 1.x, he would just be "stuck" with Beam 2.2.0.

I would like to see a Beam 2.3.0 end of December/beginning of January with Spark 2.x support (exclusive or with 1.x).

The goal of the discussion is just to know if it's worth to maintain Spark 1.x and 2.x or if I can do cleanup to support only 2.x ;)

Regards
JB

On 11/13/2017 10:56 AM, Tim Robertson wrote:
Thanks JB

On "thoughts":

- Cloudera 5.13 will still default to 1.6 even though a 2.2 parcel is available (HWX provides both) - Cloudera support for spark 2 has a list of exceptions (https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html)
   - I am not sure if the HBaseIO would be affected
   - I am not sure if structured streaming would have implications
  - it might stop customers from being able to run spark 2 at all due to support agreements
- Spark 2.3 (EOY) will drop Scala 2.10 support
- IBM's now defunct distro only has 1.6
- Oozie doesn't have a spark 2 action (need to use a shell action)
- There are a lot of folks with code running on 1.3,1.4 and 1.5 still
- Spark 2.2+ requires Java 8 too, while <2.2 was J7 like Beam (not sure if this has other implications for the cross deployment nature of Beam)

My first impressions of Beam was really favourable as it all just worked first time on a CDH Spark 1.6 cluster.  For us it is lacking resources to refactor legacy code which delays the 2.2 push.

With that said I think is it very reasonable to have a clear cut off in Beam, especially if it limits progress / causes headaches in packaging, robustness etc.  I'd recommend putting it in a 6 month timeframe which might align with 2.3?

Hope this helps,
Tim











On Mon, Nov 13, 2017 at 10:07 AM, Neville Dipale <[email protected] <mailto:[email protected]>> wrote:

    Hi JB,


       [X ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
       [ ] 0 (I don't care ;))
       [ ] -1, I would like to still support Spark 1.x, and so having support of
    both Spark 1.x and 2.x (please provide specific comment)

    On 13 November 2017 at 10:32, Jean-Baptiste Onofré <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Beamers,

        I'm forwarding this discussion & vote from the dev mailing list to the
        user mailing list.
        The goal is to have your feedback as user.

        Basically, we have two options:
        1. Right now, in the PR, we support both Spark 1.x and 2.x using three
        artifacts (common, spark1, spark2). You, as users, pick up spark1 or
        spark2 in your dependencies set depending the Spark target version you 
want.
        2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0.
        If you still want to use Spark 1.x, then, you will be stuck up to Beam
        2.2.0.

        Thoughts ?

        Thanks !
        Regards
        JB


        -------- Forwarded Message --------
        Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
        Date: Wed, 8 Nov 2017 08:27:58 +0100
        From: Jean-Baptiste Onofré <[email protected] 
<mailto:[email protected]>>
        Reply-To: [email protected] <mailto:[email protected]>
        To: [email protected] <mailto:[email protected]>

        Hi all,

        as you might know, we are working on Spark 2.x support in the Spark 
runner.

        I'm working on a PR about that:

        https://github.com/apache/beam/pull/3808
        <https://github.com/apache/beam/pull/3808>

        Today, we have something working with both Spark 1.x and 2.x from a code
        standpoint, but I have to deal with dependencies. It's the first step of
        the update as I'm still using RDD, the second step would be to support
        dataframe (but for that, I would need PCollection elements with schemas,
        that's another topic on which Eugene, Reuven and I are discussing).

        However, as all major distributions now ship Spark 2.x, I don't think
        it's required anymore to support Spark 1.x.

        If we agree, I will update and cleanup the PR to only support and focus
        on Spark 2.x.

        So, that's why I'm calling for a vote:

           [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
           [ ] 0 (I don't care ;))
           [ ] -1, I would like to still support Spark 1.x, and so having
        support of both Spark 1.x and 2.x (please provide specific comment)

        This vote is open for 48 hours (I have the commits ready, just waiting
        the end of the vote to push on the PR).

        Thanks !
        Regards
        JB
-- Jean-Baptiste Onofré
        [email protected] <mailto:[email protected]>
        http://blog.nanthrax.net
        Talend - http://www.talend.com




--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to