Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-22 Thread Amit Sela
I need to update this ;)
To start with, you could just take a look at branch-2.0.

On Sun, May 22, 2016, 01:23 Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> Thank you, Amit! I was looking for this kind of information.
>
> I did not fully read your paper, I see in it a TODO with basically the
> same question(s) [1], maybe someone from Spark team (including Databricks)
> will be so kind to send some feedback..
>
> Best,
> Ovidiu
>
> [1] Integrate “Structured Streaming”: //TODO - What (and how) will Spark
> 2.0 support (out-of-order, event-time windows, watermarks, triggers,
> accumulation modes) - how straight forward will it be to integrate with the
> Beam Model ?
>
>
> On 21 May 2016, at 23:00, Sela, Amit <ans...@paypal.com> wrote:
>
> It seems I forgot to add the link to the “Technical Vision” paper so there
> it is -
> https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing
>
> From: "Sela, Amit" <ans...@paypal.com>
> Date: Saturday, May 21, 2016 at 11:52 PM
> To: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>, "user @spark"
> <user@spark.apache.org>
> Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com>
> Subject: Re: What / Where / When / How questions in Spark 2.0 ?
>
> This is a “Technical Vision” paper for the Spark runner, which provides
> general guidelines to the future development of Spark’s Beam support as
> part of the Apache Beam (incubating) project.
> This is our JIRA -
> https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>
> Generally, I’m currently working on Datasets integration for Batch (to
> replace RDD) against Spark 1.6, and going towards enhancing Stream
> processing capabilities with Structured Streaming (2.0)
>
> And you’re welcomed to ask those questions at the Apache Beam (incubating)
> mailing list as well ;)
> http://beam.incubator.apache.org/mailing_lists/
>
> Thanks,
> Amit
>
> From: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr>
> Date: Tuesday, May 17, 2016 at 12:11 AM
> To: "user @spark" <user@spark.apache.org>
> Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com>
> Subject: Re: What / Where / When / How questions in Spark 2.0 ?
>
> Could you please consider a short answer regarding the Apache Beam
> Capability Matrix todo’s for future Spark 2.0 release [4]? (some related
> references below [5][6])
>
> Thanks
>
> [4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what
> [5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> [6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
>
> On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU <
> ovidiu-cristian.ma...@inria.fr> wrote:
>
> Hi,
>
> We can see in [2] many interesting (and expected!) improvements (promises)
> like extended SQL support, unified API (DataFrames, DataSets), improved
> engine (Tungsten relates to ideas from modern compilers and MPP databases -
> similar to Flink [3]), structured streaming etc. It seems we somehow assist
> at a smart unification of Big Data analytics (Spark, Flink - best of two
> worlds)!
>
> *How does Spark respond to the missing What/Where/When/How questions
> (capabilities) highlighted in the unified model Beam [1] ?*
>
> Best,
> Ovidiu
>
> [1]
> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
> [2]
> https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
> [3] http://stratosphere.eu/project/publications/
>
>
>
>
>


Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-21 Thread Ovidiu-Cristian MARCU
Thank you, Amit! I was looking for this kind of information.

I did not fully read your paper, I see in it a TODO with basically the same 
question(s) [1], maybe someone from Spark team (including Databricks) will be 
so kind to send some feedback..

Best,
Ovidiu

[1] Integrate “Structured Streaming”: //TODO - What (and how) will Spark 2.0 
support (out-of-order, event-time windows, watermarks, triggers, accumulation 
modes) - how straight forward will it be to integrate with the Beam Model ?


> On 21 May 2016, at 23:00, Sela, Amit <ans...@paypal.com> wrote:
> 
> It seems I forgot to add the link to the “Technical Vision” paper so there it 
> is - 
> https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing
> 
> From: "Sela, Amit" <ans...@paypal.com <mailto:ans...@paypal.com>>
> Date: Saturday, May 21, 2016 at 11:52 PM
> To: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr 
> <mailto:ovidiu-cristian.ma...@inria.fr>>, "user @spark" 
> <user@spark.apache.org <mailto:user@spark.apache.org>>
> Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com 
> <mailto:ovidiu21ma...@gmail.com>>
> Subject: Re: What / Where / When / How questions in Spark 2.0 ?
> 
> This is a “Technical Vision” paper for the Spark runner, which provides 
> general guidelines to the future development of Spark’s Beam support as part 
> of the Apache Beam (incubating) project.
> This is our JIRA - 
> https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel
>  
> <https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel>
> 
> Generally, I’m currently working on Datasets integration for Batch (to 
> replace RDD) against Spark 1.6, and going towards enhancing Stream processing 
> capabilities with Structured Streaming (2.0)
> 
> And you’re welcomed to ask those questions at the Apache Beam (incubating) 
> mailing list as well ;)
> http://beam.incubator.apache.org/mailing_lists/ 
> <http://beam.incubator.apache.org/mailing_lists/>
> 
> Thanks,
> Amit
> 
> From: Ovidiu-Cristian MARCU <ovidiu-cristian.ma...@inria.fr 
> <mailto:ovidiu-cristian.ma...@inria.fr>>
> Date: Tuesday, May 17, 2016 at 12:11 AM
> To: "user @spark" <user@spark.apache.org <mailto:user@spark.apache.org>>
> Cc: Ovidiu Cristian Marcu <ovidiu21ma...@gmail.com 
> <mailto:ovidiu21ma...@gmail.com>>
> Subject: Re: What / Where / When / How questions in Spark 2.0 ?
> 
> Could you please consider a short answer regarding the Apache Beam Capability 
> Matrix todo’s for future Spark 2.0 release [4]? (some related references 
> below [5][6])
> 
> Thanks
> 
> [4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what 
> <http://beam.incubator.apache.org/capability-matrix/#cap-full-what>
> [5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 
> <https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101>
> [6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 
> <https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102>
> 
>> On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU 
>> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> 
>> wrote:
>> 
>> Hi,
>> 
>> We can see in [2] many interesting (and expected!) improvements (promises) 
>> like extended SQL support, unified API (DataFrames, DataSets), improved 
>> engine (Tungsten relates to ideas from modern compilers and MPP databases - 
>> similar to Flink [3]), structured streaming etc. It seems we somehow assist 
>> at a smart unification of Big Data analytics (Spark, Flink - best of two 
>> worlds)!
>> 
>> How does Spark respond to the missing What/Where/When/How questions 
>> (capabilities) highlighted in the unified model Beam [1] ?
>> 
>> Best,
>> Ovidiu
>> 
>> [1] 
>> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
>>  
>> <https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective>
>> [2] 
>> https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
>>  
>> <https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html>
>> [3] http://stratosphere.eu/project/publications/ 
>> <http://stratosphere.eu/project/publications/>
>> 
>> 
> 



Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-21 Thread Sela, Amit
It seems I forgot to add the link to the “Technical Vision” paper so there it 
is - 
https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing

From: "Sela, Amit" <ans...@paypal.com<mailto:ans...@paypal.com>>
Date: Saturday, May 21, 2016 at 11:52 PM
To: Ovidiu-Cristian MARCU 
<ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>>, "user 
@spark" <user@spark.apache.org<mailto:user@spark.apache.org>>
Cc: Ovidiu Cristian Marcu 
<ovidiu21ma...@gmail.com<mailto:ovidiu21ma...@gmail.com>>
Subject: Re: What / Where / When / How questions in Spark 2.0 ?

This is a “Technical Vision” paper for the Spark runner, which provides general 
guidelines to the future development of Spark’s Beam support as part of the 
Apache Beam (incubating) project.
This is our JIRA - 
https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel

Generally, I’m currently working on Datasets integration for Batch (to replace 
RDD) against Spark 1.6, and going towards enhancing Stream processing 
capabilities with Structured Streaming (2.0)

And you’re welcomed to ask those questions at the Apache Beam (incubating) 
mailing list as well ;)
http://beam.incubator.apache.org/mailing_lists/

Thanks,
Amit

From: Ovidiu-Cristian MARCU 
<ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>>
Date: Tuesday, May 17, 2016 at 12:11 AM
To: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>>
Cc: Ovidiu Cristian Marcu 
<ovidiu21ma...@gmail.com<mailto:ovidiu21ma...@gmail.com>>
Subject: Re: What / Where / When / How questions in Spark 2.0 ?

Could you please consider a short answer regarding the Apache Beam Capability 
Matrix todo’s for future Spark 2.0 release [4]? (some related references below 
[5][6])

Thanks

[4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what
[5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
[6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU 
<ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>> wrote:

Hi,

We can see in [2] many interesting (and expected!) improvements (promises) like 
extended SQL support, unified API (DataFrames, DataSets), improved engine 
(Tungsten relates to ideas from modern compilers and MPP databases - similar to 
Flink [3]), structured streaming etc. It seems we somehow assist at a smart 
unification of Big Data analytics (Spark, Flink - best of two worlds)!

How does Spark respond to the missing What/Where/When/How questions 
(capabilities) highlighted in the unified model Beam [1] ?

Best,
Ovidiu

[1] 
https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
[2] 
https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
[3] http://stratosphere.eu/project/publications/





Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-21 Thread Sela, Amit
This is a “Technical Vision” paper for the Spark runner, which provides general 
guidelines to the future development of Spark’s Beam support as part of the 
Apache Beam (incubating) project.
This is our JIRA - 
https://issues.apache.org/jira/browse/BEAM/component/12328915/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel

Generally, I’m currently working on Datasets integration for Batch (to replace 
RDD) against Spark 1.6, and going towards enhancing Stream processing 
capabilities with Structured Streaming (2.0)

And you’re welcomed to ask those questions at the Apache Beam (incubating) 
mailing list as well ;)
http://beam.incubator.apache.org/mailing_lists/

Thanks,
Amit

From: Ovidiu-Cristian MARCU 
<ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>>
Date: Tuesday, May 17, 2016 at 12:11 AM
To: "user @spark" <user@spark.apache.org<mailto:user@spark.apache.org>>
Cc: Ovidiu Cristian Marcu 
<ovidiu21ma...@gmail.com<mailto:ovidiu21ma...@gmail.com>>
Subject: Re: What / Where / When / How questions in Spark 2.0 ?

Could you please consider a short answer regarding the Apache Beam Capability 
Matrix todo’s for future Spark 2.0 release [4]? (some related references below 
[5][6])

Thanks

[4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what
[5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
[6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU 
<ovidiu-cristian.ma...@inria.fr<mailto:ovidiu-cristian.ma...@inria.fr>> wrote:

Hi,

We can see in [2] many interesting (and expected!) improvements (promises) like 
extended SQL support, unified API (DataFrames, DataSets), improved engine 
(Tungsten relates to ideas from modern compilers and MPP databases - similar to 
Flink [3]), structured streaming etc. It seems we somehow assist at a smart 
unification of Big Data analytics (Spark, Flink - best of two worlds)!

How does Spark respond to the missing What/Where/When/How questions 
(capabilities) highlighted in the unified model Beam [1] ?

Best,
Ovidiu

[1] 
https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
[2] 
https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
[3] http://stratosphere.eu/project/publications/





Re: What / Where / When / How questions in Spark 2.0 ?

2016-05-16 Thread Ovidiu-Cristian MARCU
Could you please consider a short answer regarding the Apache Beam Capability 
Matrix todo’s for future Spark 2.0 release [4]? (some related references below 
[5][6])

Thanks

[4] http://beam.incubator.apache.org/capability-matrix/#cap-full-what 

[5] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 

[6] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 


> On 16 May 2016, at 14:18, Ovidiu-Cristian MARCU 
>  wrote:
> 
> Hi,
> 
> We can see in [2] many interesting (and expected!) improvements (promises) 
> like extended SQL support, unified API (DataFrames, DataSets), improved 
> engine (Tungsten relates to ideas from modern compilers and MPP databases - 
> similar to Flink [3]), structured streaming etc. It seems we somehow assist 
> at a smart unification of Big Data analytics (Spark, Flink - best of two 
> worlds)!
> 
> How does Spark respond to the missing What/Where/When/How questions 
> (capabilities) highlighted in the unified model Beam [1] ?
> 
> Best,
> Ovidiu
> 
> [1] 
> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
>  
> 
> [2] 
> https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
>  
> 
> [3] http://stratosphere.eu/project/publications/ 
> 
> 
> 



What / Where / When / How questions in Spark 2.0 ?

2016-05-16 Thread Ovidiu-Cristian MARCU
Hi,

We can see in [2] many interesting (and expected!) improvements (promises) like 
extended SQL support, unified API (DataFrames, DataSets), improved engine 
(Tungsten relates to ideas from modern compilers and MPP databases - similar to 
Flink [3]), structured streaming etc. It seems we somehow assist at a smart 
unification of Big Data analytics (Spark, Flink - best of two worlds)!

How does Spark respond to the missing What/Where/When/How questions 
(capabilities) highlighted in the unified model Beam [1] ?

Best,
Ovidiu

[1] 
https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
 

[2] 
https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
 

[3] http://stratosphere.eu/project/publications/