Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-05 Thread Amit Sela
Code-sharing for the 2 Spark runners proposed is a great question, and I believe my answers will clarify why I suggested 2 runners instead of a fork. Without getting into Class-by-Class details, the Spark runner currently uses the RDD (and DStream) API, while Structured Streaming (Spark 2) and the

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-04 Thread Kenneth Knowles
+1 I definitely think it is important to support spark 1 and 2 simultaneously, and I agree that side-by-side seems the best way to do it. I'll refrain from commenting on the specific technical aspects of the two runners and focus just on the split: I am also curious about the answer to Dan's quest

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-04 Thread Dan Halperin
Can they share any substantial code? If not, they will really be separate runners. If so, would it make more sense to fork into runners/spark and runners/spark2? On Thu, Aug 4, 2016 at 9:33 AM, Ismaël Mejía wrote: > +1 > > In particular for three reasons: > > 1. The new DataSet API in spark 2

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-04 Thread Ismaël Mejía
+1 In particular for three reasons: 1. The new DataSet API in spark 2 and the new semantics it allows for the runner (and the effect that we cannot retro port this to the spark 1 runner). 2. The current performance regressions in spark 2 (another reason to keep the spark 1 runner). 3. The differe

[PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-03 Thread Amit Sela
After discussions with JB, and understanding that a lot of companies running Spark will probably run 1.6.x for a while, we thought it would be a good idea to have (some) support for both branches. The SparkRunnerV1 will mostly support Batch, but could also support “KeyedState” workflows and Sessio